Artificial general intelligence (AGI) can be defined as the point at which AI reaches human level intelligence. This point in time is called the singularity.
This article will lay out some of the fundamental requirements for AGI. It will then outline current progress, and what the challenges are.
For artificial general intelligence to become realised, the ability to generalise is core. Generalisation for AI means having the capability to be able to do / learn to do any task. For instance, an AI which can play chess but which cannot process natural language is not general, but niche. It can be said that the more tasks an AI can complete, the more generalised it is.
Generalisation is a superpower found in most biological brains, with many animals showcasing their ability to perform many different distinct tasks whilst also having the ability to learn new ones and to adapt to new environments. But in particular, human brains are extremely good at generalisation. We can do a seemingly unlimited number of tasks, and learning new skills doesn’t decrease our performance in others.
This isn’t true for current neural networks. Currently, neural network training involves showing the network a large number of inputs, and then measuring the error between the networks output (aka the prediction) and the desired output (the difference is called the loss function). Using this information, we can do a process called backpropagation to help tune the neural networks weights to give predictions closer to the desired output. Repeating this process millions of times trains a neural network.
If this network is now trained on another task, the existing weights will change and will no longer give good results for the initial task. It will also not perform as well on the new task as it would if the network had been trained solely on the new task.
Humans are also very good at 0 shot learning, which is where we tackle a task without any prior examples or teaching. We can do 0 shot learning across a number of different tasks. An example of this might be our ability to play games. We don’t learn to play each game from scratch. We use our knowledge of other games to help us. We don’t have to practice at a specific game to be able to play it. Large language models (LLMs) like GPT-3 show promising results when tasked with 0-shot and 1-shot questions.
Currently, the most generalised neural network is GATO. GATO has been trained to be able to complete 604 different tasks, which is highly impressive! Whilst GATO does suffer from the limitations discussed above (degraded performance), its ability to be able to complete 604 different tasks is clearly a huge leap forward architecturally. GATO is a relatively small model as its training was done as a proof of concept. This has led some AI scientists even believe that if GATO was trained on more data (a lot more) it itself would be able to reach AGI, or close to it.
Knowledge transfer is where knowledge learnt through one domain is used in another. For instance we learn to drive through practice. This is also how AIs learn (supervised learning). But we can also learn to drive through reading, conversing with others, watching videos, etc. We are then able to transfer the knowledge learnt from these different domains, and to put it into action.
Whilst GATO has an ability to process natural language and have conversations, it cannot transfer the knowledge it has learnt from wikipedia from its natural language domain, for example, to its image recognition domain.
DeepMind provides a good explanation of the problem:
A child may recognise real animals at the zoo after seeing a few pictures of the animals in a book, despite differences between the two. But for a typical visual model to learn a new task, it must be trained on tens of thousands of examples specifically labelled for that task. If the goal is to count and identify animals in an image, as in “three zebras”, one would have to collect thousands of images and annotate each image with their quantity and species. This process is inefficient, expensive, and resource-intensive, requiring large amounts of annotated data and the need to train a new model each time it’s confronted with a new task. – DeepMind
Flamingo is based off of DeepMind’s current SOTA language model, Chinchilla. A compatible neural network was trained on image recognition, and was then combine with Chinchilla to produce Flamingo. This is a clear example of knowledge gained through one medium being used in another. But here, the scale is much smaller than what would be required for AGI, and hand-merging neural networks in this way is inefficient and not practical at scale.
Once a neural network has been trained, it is very difficult to modify it any further. The weights of the network get optimised to a point where its loss cannot fall any further, and the introduction of new and different training data sets causes little progress.
This means that when a new piece of research or architecture comes out, it is impossible to test it on already built models. Instead, new models have to be trained from scratch. All of the old weights have to be chucked away, along with all of the compute that has gone into its training.
Humans obviously do not have this. We do not have to wipe our brains every time we want to improve at something.
This limitation makes it very difficult to measure the impacts of small changes, as doing so would be too expensive. It also means that all of the great work of the previous versions of AI done during training cannot be used by future versions.
The best example of our progress on this front today will again come from Flamingo. This is the first time (AFAIK) that two AI’s have been combined whilst still working efficiently. It is still not possible to build upon Chinchilla directly, but Flamingo showed that a seperate network could be merged with it.
Currently, the training of a neural network is different to the use of a neural network. For example, when using a language model like GPT-3 we cannot tell it stuff that will make it change its weights. This means that it cannot learn anything that we type.
The ability for AIs to be able to learn on the go is surely important for reaching AGI. For example, imagine an AI that has been trained to write legal documents. Now, if the required structure of a legal document changes, the AI cannot currently be shown the new structure and told to adapt to it. It would instead need to be retrained from scratch.
There are currently no public examples of any progress on this, but this is a highly active area of research.
Similar to the last fundamental, current AIs do not have any way to remember information. When conversing with a language model, they may appear to be able to remember things. However, this is because they are context based. Each time you send the AI a message, it rereads the whole transcript in order to add context to its answer. This is not a perfect solution though, as the AIs are only able to process around 2000 words in this way. This means that any information that was more than 2000 words ago will not be used as context by the AI.
This list is not extensive, and there are many more problems which need to be solved before we reach AGI. Though with current rates of progress, it is not clear whether this will be within a decade, or beyond a century. Nevertheless, the points discussed here are important for all areas of AI, and not just for reaching general intelligence.
I personally believe that AGI will require many more technical breakthroughs. Perhaps it will even be found that an ingredient for AGI is impossible to acquire through neural networks. For example, will it ever be possible for neural networks to be able to remember information outside of its training sets? And if so, how long will it take for the field to reach that point? My guess, at current rates of progress, is not too long at all…