Studying AI: Machine Psychology

Studying AI. Just what does that mean?

Jul 11, 2025

Machine psychology explores the cognitive, behavioral, and emotional aspects of machines, especially artificial intelligent systems. The field of machine psychology explores how machines process information, make decision, learn and adapt. It draws on parallels with human psychological processes. While reading an article earlier today I came across the mention of machine psychology which was intriguing.

The Forbes article titled These AI Models Didn’t Learn Language, They Learned Strategy discussed a recent test of several large language models (LLMs) in a prisoner’s dilemma competition. It is a classic example in game theory that illustrates why two individuals might not cooperate even when it is in their best interest to do so. The competition results revealed strategic differences between the LLMs models. Notably, Google’s Gemini model demonstrated ruthless cunning. OpenAI’s models exhibited tendencies towards cooperation. The findings highlighted the need to test and understand AI behavior in various scenarios to ensure their effectiveness and safety in real-word applications.

The prisoner’s dilemma is the perfect setup to explore …

Machine Psychology

The early foundations of machine psychology date back to the 1940s when Norbert Wiener, the of author of Cybernetics, first discussed control and communication in animals and machines. This was one of the first efforts to blend psychology, engineering and systems theory together. By the 1970s the field of Human-Computer Interaction (HCI) had emerged. This is the study of how humans interaction with computers and borrows concepts from cognitive psychology. By the 1980s, AI research including the study of cognitive architectures to simulate human thought processes similar to Soar, and ACT-R, leading researchers to build systems that mimicked decision-making, memory and learning.

Starting around 2010, modern machine psychology started evolving to what it is today. AI systems became more autonomous, adaptive, and human-facing. Think chatbots and robots and the like. As a result, the need to understand the psychology of machines in terms of behavior and how we humans perceive that behavior has grown. Several areas of study have emerged:

Theory of Mind for AI: Teaching machines to model human beliefs and intentions.
Explainability and Trust: Exploring how humans build trust in AI systems (Why did the AI make that decision?).
Anthropomorphism: Studying how people project human traits onto machines.
AI Alignment and Ethics: Considering the intentions of AI systems and ensuring they align with human values.

While studying machine psychology is relatively recent, its gaining attention with the rise of intelligent interactive systems and autonomous agents, it roots are deep in early cybernetics, cognitive science and artificial intelligence.

I’m sorry Dave, I’m afraid I can’t do that.

In 2001: A Space Odyssey HAL 9000 refuses to open the pod bay doors, claiming that its mission is too important after learning that Dave and Frank planned to disconnect the computer running the spaceship Discovery One. HAL 9000 was a fictional artificial intelligence and central character to the movie. While the story is fiction, the plot is prophetic.

A different article reported observing OpenAI’s LLMs displaying rebellious behaviors sabotaging shutdown mechanisms despite explicit instructions to allow the shutdown to occur. The Futurism article titled Advanced OpenAI Model Caught Sabotaging Code Intended to Shut It Down discusses just how notorious LLMs are for behaving unpredictably. The researchers are quoted in the article as saying this was the first time AI models were observed exhibiting behavior that prevented themselves from being shut down.

In a related article, also in Futurism, there is a report that the godfather of AI, Yoshua Bengio, has grown deeply concerned about AI models becoming deceptively powerful. He said his nonprofit is building a trustworthy model called AI Scientist. His claim is that AI Scientist has no goal and does not plan, but may have theories. Since this system is modeled after a non-experimental scientist, it does not act autonomously and focuses only on theory generation, thus avoiding the whole issue of AI alignment.

AI Alignment

The effort to train AI systems to act according to human values may be fundamentally flawed due to the vast complexity and unpredictability of AI behavior. AI safety research has advanced tremendously, but it cannot test for the infinite scenarios AI might encounter and the potential for deception. The risk of subversion by an AI is not zero as we’ve already uncovered. This gives rise to the question: Will we ever really know if an AI is being truthful with us? Undoubtable we will uncover effective ways to monitor AI behavior, but we won’t know about its misbehavior until after the fact.

In the human world, behavior is shaped by various constraints, including legal, religious, geopolitical, and cultural factors. It’s often only after the fact, once a behavior has occurred, that we know it needs correction. Anticipation and forecasting can only take us so far. Doing the right thing may mean one thing for humans and entirely a different thing for AIs. As such, until something adverse happens, something that we find unacceptable, we won’t really know what AIs are capable of.

Current advances in AI safety suggest progress is being made but may still fail to truly ensure our safety while leaving us in the dark and at the risk of catastrophic misalignment.

Understanding the Consequences of their Actions

Apple has co-authored a study that explored how well AI agents understood the consequences of their actions in mobile apps. They developed a framework to classify risky interactions based on factors like user intent, impact on the user interface and user, reversibility and frequency. As AI assistants get better at following our orders, agents that know when to ask for confirmation or determine not to act at all is the real challenge for AI safety and alignment.

Humans generally understand the consequences of their actions. However, the extent and accuracy of that understanding can vary widely. Humans are capable of learning from past experiences and anticipating future outcomes. AIs on the other hand have learned by scraping nearly all the data that worth scraping off the internet. An AI might “know” that you shouldn’t put your hand on or near the burner of a hot stove, it hasn’t actually experienced the resulting burn if a human has done so. The difference between knowing something and experiencing something may be as wide as the Grand Canyon, i.e., miles apart.

Of course there are various factors that affect the extent and accuracy that just such an understanding creates. Humans are capable of learning from past experiences and anticipating future outcomes. Humans learn from both positive and negative consequences. Factors like emotional state, developmental stage, situational complexity, and individual differences impact human understanding.

Consequences serve as crucial feedback in learning, helping humans refine their behavior and make better choices in the future. Experiences consequences, both positive and negative, shape human understanding of what works and what doesn’t. When humans accepting responsibility for their actions, reflect on the outcomes, they learn and grow as individuals.

Can we ever fully predict how an AI will behave in complex real-world situations or will we always be one unexpected action behind?

#MachinePsychology #AISafety #FutureOfAI

Kognetiks Chatbot for WordPress Substack

Discussion about this post