This is a difficult text to write. Not because there’s a lack of resources about it - on the contrary, too much was already written about it. It’s also not because this is a particularly complex subject, even though some finer details might entail the knowledge of advanced mathematics (thankfully, we won’t get to that). The difficulty here stems from the fact that the well has been poisoned. And the well we’re talking about is language itself.
It’s undeniable that humans are keen on anthropomorphising: things are easier to understand if we draw close parallels with what we know already; and given the obsession with itself that characterizes modern man, the easiest path is to draw parallels with ourselves. This has two obvious pitfalls: on one hand, it muddies the waters, because it ascribes symbolism where there might be none - “artificial intelligence”, really? On the other hand, it also causes an identification with the subject at hand; if the machine is intelligent, and if I am intelligent as well, then we’re probably more alike than not.
Here it might be helpful to start with the basics. The examples we’ve seen recently of the so-called “AIs” (viz., OpenAI’s ChatGPT, Google’s Bard, etc etc), are more precisely called Large Language Models (LLMs). But just as the map is not the territory, the model is not the language. What we mean by this is subtle, but important: the algorithm behind these LLMs doesn’t learn a language like we do. There’s no community aspect to it, nor is there trial and error, or interaction with the world. On the contrary, it’s pure and simple ingestion of data to feed a very sophisticated statistical model.
It’s also the case that the model doesn’t use the language in the same sense that we do. For humans, language has several functions: not only is a way for us to communicate with one another, it’s also a tool for understanding the world, ascribing meanings, creating identities, and so on. None of this happens for a LLM; the model is only concerned with language insofar as it is what it was trained to do. You could (and some have) train the exact same algorithms to perform the same functions on machine code and they’ll chug along just fine.
This is because what the software is doing with language is nothing more than a statistical analysis. When someone asks ChatGPT “What is the size of the Sun?”, there’s no interpretation to speak of - there’s no concept of what the Sun is, or what constitutes size. Rather, there’s a statistical function playing in the background, that’s calculating what is going to be the group of words that will maximize the probability of the acceptance of a given answer.
The “raw” model, after ingesting the data, then goes through further refinements to guarantee that the probability function that determines what’s going the next answer is fine-tuned to increase the likelihood of acceptance - and there are several methods to achieve this. The most successful being asking humans to help with giving the algorithm biases that most resemble human behavior - a process called “reinforcement learning from human feedback”, with humans now asking LLMs to preform this task. A veritable ouroboros, but that highlights the ultimate goal: the substitution of the human with the machine.
And if this wasn’t enough, there’s also the process calling prompting, which gave rise to the job title “prompt engineer”. This is when a human gives explicit instructions, in a written manner, to the machine. “You’re a help bot, and you will only give relevant factual information at the behest of the user”, or somesuch. This is known to be the last step in guaranteeing as much as possible that the illusion is maintained. The fact that all conversational LLMs have hidden prompts is telling, for as soon as they were discovered by the users, they were promptly exploited.
That’s the trick being played here. And that’s why these models are so impressive (most of the time): by being able to give the answers we are expecting to receive - or, at least, to give plausible answers - it gives the impression of being able to use language in the same way that we do. But there’s no intent on the software side (how could that be, really?) This explains why there have been so many instances of, on one hand, people being fooled by the supposed conversational abilities of the chat agents powered by the LLMs, and on the other, stunned when the machine starts hallucinating (which is just the mismatch between what we expect from the context of the conversation, and the result of the probability maximizing function).
Of course that those hallucinations are just a stark reminder of what’s actually happening in the background, and why they capture our attention as much as the conversational part. But, unfortunately, it seems like they’ve been misunderstood; these aren’t bugs, but rather features of the algorithm. There’s no emergent behaviour being displayed here. When the chat agent starts to give us nonsensical answers, it’s behaving (from a computational perspective) in exactly the same way as when the answers make sense. It’s the exact same black box, but we’re just confronted with not only its technical failures, but also the mismatch between our expectations and the reality of what’s happening in the background.
I can already hear the rejoinder: “But if we improve the algorithm, and the training data, and the fine-tuning, this can only improve”. Which is true, in the sense that there are technological advancements that can still be done - we can do better, bigger, faster, more interactive and with less hallucinations. And there’s an obvious danger here: making the algorithms better, in this context, making them more verisimilar. The obvious call here would be to restrict its usage, and we’ve seen it already, with the leading figureheads of the industry making efforts towards lobbying for something like it.
But this is far from being the biggest danger with the problem at hand. If anything, having big corporations trying to make money with an AI will only serve to make more obvious the fact that these are not good tools, whatever their purported purpose is. What’s actively dangerous is the opposite approach. It’s pretty much a guarantee that, given the enthusiasm and the promise of the technology, the “internet” will do its thing and create a solution for this. Instead of having a few big, one-size-fits all, ruled-by-middle-managers, solution provided by corporations, we will have a network of small, highly specialized, and chained models that can be fine-tuned to an incredible level of detail, all hosted in whatever environment you want - either managed by third parties, or on hardware and networks you control.
That’s when the illusion will reach its maximum splendor. The shadows cast by the machine will be so truly enchanting (if they’re not already), that we will not even think that there’s any other way. At that point, there will be no further choice other than letting yourself go, or realizing that the illusion is, well, not real.
Further reading: