The Turing Test is Dead, Long Live the Turing Test!
In his now-classic 1950 paper, “Computing Machinery and Intelligence,” British scientist Alan Turing proposed a thought experiment called the “Imitation Game,” played among three participants: A digital computer with adequate storage and computational speed, a random human and another human who poses questions to the other two. The participants, isolated in three separate rooms, communicate only by typing into a terminal. The computer wins the question-answer game if its answers fool the interrogator into thinking it to be human.
The simplicity of Turing’s Imitation Game — later dubbed the “Turing Test for computer intelligence” — soon captured the popular imagination. It still reigns as the gold standard for gauging if a machine has equaled or surpassed human intellectual capacities. But it is important to note, as Turing points out, that the idea behind the game is not to establish whether a computer can “think,” but, rather, whether it can act — respond to the interrogator’s questions — indistinguishably from the way a human would. Also, important to note is that the Turing test is a subjective test; it doesn’t rely on any absolute metrics of intelligence. Rather, it is the subjective opinion of the interrogator that matters.
In his paper, Turing suggested several specimen questions that such an interrogator might ask. Among them, Turing lists an intriguing one: “[Can you] please write me a sonnet on the subject of the Forth bridge?”
The Forth bridge is a railway bridge across the Firth of Forth in Scotland. Completed in 1890, it is considered a symbol of Scotland and is a designated UNESCO World Heritage site. But outside of Scottish pride, it’s not particularly noteworthy or famous. It may, however, be an appropriately obscure reference in the context of the Imitation Game and a worthwhile question to pose to the current crop of Large Language Models: ChatGPT and its ilk
And, so, here is ChatGPT’s response to Turing’s question:
In Scotland’s land of awe and ancient tales,
There stands a mighty span of iron grace,
A testament to engineering’s trails,
A masterpiece that time cannot erase.
Oh, Forth Bridge! Your majestic arches soar,
Across the firth, where waters wild do flow,
Defying winds that fiercely gust and roar,
You steadfastly endure, a sight to show.
A symphony of steel, you proudly stand,
Connecting lands and dreams with every stride,
A symbol of a nation’s skillful hand,
A beacon of ambition, far and wide.
Oh, Forth Bridge! Your beauty spans the years,
A marvel, cherished, overcoming fears.
Try it for yourself. If the result’s too banal for your taste, generate another response. In every case, ChatGPT spits out a reasonable effort, not just passably good content, but content that is organized in correct Shakespearean sonnet structure (four quatrains and couplet). It’s better than anything I could have written. Thus, in my opinion as the interrogator, ChatGPT passes the Turing Test. In fact, you could say that the Turing Test is now obsolete.
It was never particularly useful. Turing, himself, never intimated that the Imitation Game could do anything more than assess whether a machine could act like a human in a precisely defined conversational format; the test has nothing to say about the larger issue of general machine intelligence beyond the constraints of format or context. Three decades after Turing’s paper, philosopher John Searle’s Chinese Room thought experiment illustrated the test’s limits. Searle argued that a computer could pass the Turing test simply by manipulating symbols using rules associated solely with the syntactic properties of the symbols, not with their semantic properties (i.e., their meanings). In other words, while a computer is superbly capable of following a set of rules, it is incapable of “understanding,” a necessary requirement for human intelligence,
So long as machines very obviously lagged human capabilities the limitations of the Turing test hardly mattered. Over time, the test came to represent a gauge of machine intelligence in the broadest sense. But now it does matter. Telling machines from men is particularly important now because the eternal human quest to create machines that match or exceed our own span of intellectual abilities is reaching a tipping point. Fifteen years ago futurist Ray Kurzweil, currently principal AI researcher at Google, predicted a “technological singularity” event — computers surpassing human intellectual capabilities — by 2045.
Kurzweil’s singularity isn’t achievable even by exponentially extrapolating the speed and power of today’s versions of AI. We would need machines that can successfully perform, not just one thing, but any intellectual task that a human being can through the processes of learning, understanding and reasoning. We would need what is known as Strong AI or Artificial General Intelligence (AGI).
In contrast, every version of AI built to date, going back to Deep Blue, which bested world chess champion Garry Kasparov in 1996, is Weak AI. While ChatGPT conclusively outperforms humans on a specific task – — passing the conversational Turing Test — all it’s good for is finding the most probable “next word,” based on historical patterns in the data that it is trained on. In a 2021 paper linguist Emily Bender from the University of Washington termed it a stochastic parrot. ChatGPT’s successor, GPT-4, is vastly more powerful and capable of visual processing. It’s also safer because it operates within certain guardrails — mechanisms, policies and practices defined by its developers to confine the range of its answers within the norms of ethics and morals. But it’s still Weak AI; Deep Blue can’t generate sonnets and GPT-4 can’t play chess.
That’s not general intelligence. Not yet at least. But given the tsunami of money flooding into AI development it’s not unlikely we could build Strong AI, as defined, in the not so distant future. Economic incentives are a powerful spur for innovation.
Getting to Strong AI is going to need unimaginable — perhaps unaffordable — amounts of computing horsepower, perhaps even a new paradigm such as quantum computers and new hardware architectures capable of efficiently implementing new algorithmic approaches. Still, those are all “known unknowns” that, in the most optimistic scenario, would probably not require fundamental scientific breakthroughs. So, it wouldn’t be surprising if Kurzweil were to update his prediction on the advent of the singularity.
But even imbued with Strong AI, machines still would not be fully human. They would be zombies in the philosophical sense: They’d look exactly like humans and behave in the same way as humans do, but they’d lack what psychologists call phenomenal consciousness. They would not feel anything “inside.” They’d be missing the very core of what it means to be human.
Let’s go back to Turing’s curious question. In his paper, Turing references a talk given in 1949 by a prominent neurosurgeon of the time, Geoffrey Jefferson. In his speech called The Mind of Mechanical Man, Jefferson discussed the differences between computers and the human brain. “Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain — that is, not only write it but know that it had written it. No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its success, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants.”
Among consciousness researchers, the feelings Jefferson describes are known as “qualia” — first person (subjective) experiences of phenomena. Qualia is the term for what mental states — success, grief, sex, whatever — feel like.
While phenomenal consciousness is about feelings, access consciousness is about cognition, defined as the ability to learn, understand and reason. (Intelligence, for this discussion, is defined as the degree to which those cognitive abilities are manifest in an organism.) Access consciousness can be modeled by standard methods of cognitive science in terms of computational or neural mechanisms, which, in theory, means we can create algorithms and hardware architectures that mimic human cognition.
Not so with phenomenal consciousness. We have no idea what neural processes and mechanisms generate feelings. Phenomenal consciousness is “not computational” says 2020 physics Nobelist, Roger Penrose; it relies on physical processes beyond our ken, he believes. What that means, very simply, is that we can’t build computers that have feelings because we don’t understand how they arise. The best we could ever do is build zombies: machines that appear human in every way — even physically — but are devoid of the spark that makes us human.
We’ve gotten used to working with Weak AI; the world is awash in weakly intelligent chatbots. But as we approach singularity, the world is likely to be overrun with zombies. The test we need now — call it Turing II — is one that can tell humans from zombies.
If zombies are different from humans solely in their inability to have subjective experiences — to “feel” at the deepest level — then Turing II would not be a test of intelligence at all, but of phenomenal consciousness. The objective would be to detect “the spark that makes us human.”
It’s important to call this new test Turing II for several reasons: As in the original, Turing II would have to differentiate between a human and a machine pretending to be a human; Turing II would also retain the original’s simple, question-answer format, an appealing feature that helped enshrine it in computing lore. A couple of things would be different: Since zombies would physically resemble humans there would be no need to isolate the participants, a key requirement of Turing I. Alternatively, the test could be carried out in the metaverse with the responding parties assuming avatars of their choice.
Finally, the questions posed would aim not to gauge knowledge or intelligence, but rather to distinguish between true feelings and mere representations. That’s easier said than done, of course, since feelings are subjective experiences; the “inner” experiencer may or may not display “outer” indications of the experiences. Could we then pose questions clever enough to “peek behind the curtain” in some way?
Ultimately, our best interrogator for Turing II may not be a clever human at all; it could be a known zombie, exquisitely trained — on humans, by humans — to detect the faintest signs of genuine feelings simply through conversation. Instead of humans trying to identify zombies, let’s use the zombies themselves to identify their own kind.
Humanity is suffering from an epistemological crisis; our relationship to the truth feels ever more tenuous. Worse, within the span of a couple of decades we will have to contend with an invasion of crypto humans. In that context, the development of epistemological tools such as Turing II is an imperative.