Random thoughts on Cognitive Psychology & Artificial Intelligence

As a junior student majoring in Computer Science, I enrolled in the PSYC2050-Cognitive Psychology course in light of the desire to know more about how humans magically perceive the world around themselves and possess the extremely extraordinary intelligence that dominates all other animals existing throughout the history of this planet.

P/s: I put the purple color text to indicate the question/problem that I found really intuitively intriguing and that I’m still working on.

Week 2 & 3: Basic Processes in Visual Perception & Object and Face Recognition

As we learn about the basic vision system, comprising color, depth perception, etc. I found this system is much more amazing than just seeing the world. Under the light wave going into the eyes, there are many processes that happen concurrently at an outstanding reflex speed, and then they are built upon that information again to finalize and reconstruct a mental state of the world. The lecture made me revisit the concept of vision in computers, which completely lacks understanding of the 3D worlds, even when people train the system to do every possible task that humans can do: classification, depth prediction, semantic segmentation, and object detection. Nevertheless, the system right now is still far away from being used for robots because of the aforementioned limitation. Thus, I started to question the hidden dynamics of vision maybe not possessing the understanding of color or depth, they are just the consequences of our human learning to interact with the world since we were born. As a matter of fact, other research shows that newborn babies do not interpret color or depth, and those abilities are gradually developed over time. Therefore, I started to believe that do develop an outstanding machine system, we need to incentivize the machines to develop their special vision system that might converge into the same vision system as humans.

Week 4: Attention and Performance

the current miraculous AI development has been hugely inspired by theories from neuroscience and cognitive psychology, and mimicking attention was the most significant work that led to powerful ChatGPT. While the definitions of “Early Selection” and “Late Selection” converge into the theory of “Semantic content drives attention” without a doubt, perceptual organization makes me ponder a lot.

In terms of perceptual organization, we learned about the biases of humans in constructing meaningful parts from different abstract pictures. But the question “Where do the biases come from?” stuck in my head hardly. There isn’t a specific form of those biases exhibited around us every day, why do we have those biases? Currently, I’m hypothesizing that those biases are the methods our brain uses, trying to compositionally generalize the peculiar images into smaller parts that we can make sense of. On the other hand, I believe symmetry is another form of bias that we unconsciously utilize every day but it did not appear in the content. I would definitely think about this behavior more.

Then, the definition of “serial search” and “pop-out effect” was introduced and I found them to be very astonishing and relatable to my current research about compositional generalization. The current generative text-to-image AI models do not have this type of processing, causing them to fall shortly when encountering complex text prompts. After the lecture, I quickly implement the idea of “serial search” into generative AI. For example, given an input text prompt of “red car and yellow bicycle”, I train the model to generate images for “red car” and “yellow bicycle” and ultimately based on those generated images produce the faithful output. As expected, the models performed tremendously well and I submitted the work to a big international conference.

Week 5 & 6: Classical conditioning & Operant conditioning

The lecture covered operant conditioning, emphasizing Skinner’s framework of learning through reinforcement and punishment, which shapes behavior using key processes such as determining base rates, training, extinction, and spontaneous recovery. Critical concepts include positive and negative reinforcement to increase behavior, and punishment to suppress it, with context cues (discriminative stimuli) playing a significant role. Advanced techniques like shaping and chaining were highlighted for building complex behaviors, while reinforcement schedules and superstitious behaviors demonstrated how patterns of rewards and expectations influence learning. The session also touched on vicarious learning, rooted in Bandura’s social learning theory, where individuals learn by observing others, showcasing the importance of modeling and the social context in behavior acquisition.

Operant conditioning and vicarious learning have direct parallels to Reinforcement Learning in AI, where agents learn optimal behaviors through trial-and-error interactions with an environment, guided by rewards and penalties. RL leverages the principles of positive and negative reinforcement to maximize cumulative rewards, similar to Skinner’s focus on shaping and chaining behaviors through feedback mechanisms. However, a critical limitation of RL is its reliance on reward formulations, which oversimplify the complexities of real-world environments. And in the sense that every ability is boiled down to learning as I pointed out in week 2, I wonder what are the main components that incentivize humans to develop all forms of intelligence. Why do humans, with a small biological difference compared to chimpanzees, successfully evolve even when all creatures were born with no self-awareness about the environment? What motivates the development of the vision system?

Week 7 & 8: Short-term memory & Working Memory

Sensory memory and short-term memory were explained using the modal model of memory, which emphasizes their role in information processing. Sensory memory, including iconic (visual) and echoic (auditory) memory, is characterized by a large capacity and brief duration, holding unprocessed information just long enough for attention to transfer to short-term memory. Short-term memory, on the other hand, has a limited capacity—often cited as 7±2 items (Miller, 1956) or around four items in recent studies (Cowan, 2021). It temporarily maintains information through rehearsal, but distractions can easily disrupt this retention. Techniques like chunking improve its efficiency by organizing data into meaningful units.

As I reflect on the current state of AI systems, I can’t help but notice how far they are from replicating human sensory and short-term memory. In humans, sensory memory synchronizes inputs from all our senses—visual, auditory, tactile, and even olfactory—holding an incredible amount of raw data briefly in a pre-attentive state. This seamless integration allows us to prioritize and process information efficiently, transferring only what’s important to short-term memory for further manipulation. In contrast, AI systems process sensory inputs in isolation, without a shared temporal or contextual framework to unify them. For instance, when AI analyzes an image and an audio clip, it doesn’t inherently combine these data streams into a coherent sensory experience like we do.

Short-term memory, or working memory, is another area where AI lags behind. My own working memory lets me actively hold and manipulate information, whether I’m doing mental arithmetic or solving complex problems. It evolves dynamically, shaped by experience and attention, allowing me to refine ideas or strategies incrementally. AI, however, relies on a fixed context window, like the token limits in large language models. These windows feel static to me—they can store a limited sequence of inputs but don’t refine or adapt their stored information as I would when tackling a problem. If I want to rework an idea, I naturally iterate and build on it, whereas AI needs explicit refeeding of data to do the same. This lack of dynamic, integrated memory systems makes me realize how much more there is to achieve before AI truly mirrors human cognition.

Week 9: Procedural memory

When I think about procedural memory, it reminds me of how we build skills over time through repeated practice, like riding a bike or typing on a keyboard. Once learned, these skills become automatic, requiring little conscious effort. In AI, this concept parallels transfer learning, where a model trained on one task can adapt its knowledge to a related task. For example, a neural network trained to recognize objects in one dataset can use its learned features to perform well on a different dataset with minimal additional training. Just as procedural memory allows me to apply skills flexibly in new situations, transfer learning enables AI to leverage prior knowledge efficiently, reducing the need to start learning from scratch every time.

Week 10: Explicit memory

The theory of memory decay suggests that information stored in our memory weakens over time if it is not actively recalled or used. I hypothesize that this natural process serves an essential function in human cognition: it clears space for new knowledge while retaining the most valuable and frequently accessed information. Our brain, faced with finite resources, must prioritize efficiency, and memory decay might act as a filter to ensure that only relevant and useful information persists. This aligns with the idea that repeated use and reinforcement—through recall, practice, or emotional significance—consolidates memories into long-term storage, effectively marking them as important. Meanwhile, less critical or outdated information fades away, preventing our cognitive systems from being overwhelmed by unnecessary details. This dynamic balance reflects an adaptive mechanism that helps us stay focused on present and future challenges without being bogged down by an excess of rarely useful memories. So even when we name it long-term memory, does it possess elasticity that paves the path to the adaptability of humans? It’s not just about retention; it’s about evolution, enabling us to synthesize complex concepts, create innovative solutions, and draw meaningful connections between past experiences and future goals.

This adaptability is strikingly analogous to the concept of a “gene.” Just as genes encode information that evolves through mutations and natural selection, long-term memory evolves through exposure, reinforcement, and the pruning of unused connections. Genes allow the human race to adapt linearly over generations, shaping our biology to meet environmental demands. Similarly, long-term memory allows individuals to adapt within their lifetime, shaping cognition and behavior to navigate an ever-changing world. Both systems are not perfect archives but are instead optimized for resilience and flexibility. They prioritize what is valuable for survival and progress, whether it’s traits for an organism or ideas for a mind. Hence, is that another factor hinders AI capabilities is digital computation where every number, and parameter are stored with absolute value. And if we want to move into analog computation with elasticity, we need to revisit the question: what will incentivize the evolution as we want?

Week 11&12: Reasoning, problem-solving, and creativity & Improving problem-solving

In week 11 and week 12, we went through strategies humans use to tackle complex challenges. Key methods include analogy, where solutions to past problems are applied to new ones; means-end analysis, which involves creating subgoals to bridge gaps between current and goal states; difference reduction, focusing on minimizing discrepancies between present and desired outcomes; and alternative problem representations to facilitate understanding. Techniques like incubation allow unproductive fixations to fade, enabling insights and novel connections through breaks in conscious problem-solving. Tasks like the Wason selection task and the mutilated checkerboard problem showcase how representation and reasoning strategies impact outcomes.

All these methods reflect different forms of compositional reasoning or compositional generalization. They rely on combining and reapplying smaller, well-understood knowledge elements to form complex solutions. Whether through analogies, breaking problems into subgoals, or restructuring representations, these approaches demonstrate the human ability to extrapolate and generalize across diverse scenarios. This type of reasoning is central to problem-solving but remains a significant limitation in current AI systems, as they struggle to replicate the flexible and creative compositional abilities inherent in human cognition.

Week 13 & 14: Human Intelligence & Artificial intelligence

Throughout the 2 weeks as well as the whole course, I fervently believe that compositional generalization—the ability to create and understand complex knowledge by composing primitive knowledge—is essential in how humans solve problems, ranging from mathematics and physics to daily life tasks. Nevertheless, despite the astonishing performance in language and vision, current AI systems do not exhibit compositional generalization. Furthermore, pretraining and scaling AI systems are hitting a wall, necessitating a more efficient training regime that exploits the hidden structure of data and extrapolates easily to out-of-distribution data, such as compositionality. Therefore, I fervently believe this lack of compositional generalization is the key drawback hindering AI from significantly closing the colossal gap with human-level reasoning.