The 7 Deadly Sins of AGI Design

The term ‘AGI’ has been badly distorted and abused, be it out of ignorance or for fund-raising purposes. It properly refers to systems that can autonomously learn to perform novel human-level cognitive tasks given limited resources of time and compute; essentially what a smart college graduate could learn by ‘hitting the books’. They could learn to become a skilled programmer, accountant, or research scientist.

From this it follows that:
a) AGI will be even more valuable than electricity
b) Deep Learning and Large Language Models (LLMs) won’t get us there

So why, in spite of tens of billions of dollars and thousands of Ph.D.-level AI researchers working on this for 10 years or more, don’t we have a clear, obvious, and generally acceptable path to AGI yet?

A one-sentence summary is that the tremendous success of Big Data approaches over the past 10 years has ‘sucked all of the oxygen out of the air’, and viable alternatives have not been pursued by enough people with meaningful resources.

A more detailed analysis exposes several common key mistakes that have prevented meaningful progress towards Real AGI. I call them ‘The 7 Deadly Sins of AGI design’.

Focus on Knowledge — There’s a common misconception that knowledge is a good measure of intelligence. This is not so. An encyclopedia has a lot of knowledge but no intelligence. We also all know people who are quite knowledgeable but seem to lack in intelligence. On the other hand, we know of Bushmen in Africa who grew up not knowing books, electricity, or cars and yet able to learn to succeed in a large modern city within a short period of time.
 
AI development over the past several decades has focused far too much on having knowledge and skills rather than the ability to dynamically acquire, integrate, and generalize new knowledge and skills effectively. Large Language Models in particular are quite unable to update their core model in real time.
Narrow AI — The second major problem preventing progress towards AGI is the focus on particular abilities rather than a general ability to apply its intelligence to a very wide and dynamic range of problems — like we can. This does not mean that an AGI should be ‘God-like’ and be able to solve any and all problems. Many (most?) real-world problems, like weather prediction, are not fully tractable and require highly specialized and optimized systems. What we’re looking for in an AGI is the ability to handle a variety of often novel problems and figure out methods and tools needed to solve them effectively, without needing a human in the loop. AGIs should fundamentally be excellent generalists and tool users, however they may go on to specialize.
 
The narrowness of AI is most obvious in earlier efforts like IBM’s Deep Blue, the world chess champion. In spite of its impressive performance it couldn’t even play checkers. Even today we have Alpha Fold, video generation, and LLM chatbots manually engineered for specific tasks.
 
The narrowness problem is exacerbated with a constant push for quick results like beating existing benchmarks, developing MVPs or narrow products, or being able to publish some incremental improvement. Focusing on wider, more general capabilities yields inferior conventional results in the short run.
 
External Intelligence — The next ‘sin’ is closely related to the previous one. Narrow AI actually implies that the intelligence needed to solve a specified problem derives mostly from the human programmer or data scientists who figures out how to design, build and tune a system to achieve the desired result.
 
Current systems still need humans to understand the real-world requirements of specific applications and use their smarts to engineer workable solutions via fine-tuning, prompt engineering, and external databases, etc. The intelligence is external rather than internal to the system. Real AGI’s core intelligence needs to be able to figure out things autonomously.
 
Pre-Trained — The tremendous advances in AI over the past 10 years have largely been driven by the creation of ever larger models using variations of backpropagation training. This approach requires that all training data is available up front, and that training occurs in one continuous process. With current models this number-crunching can take several months to complete and cost 100’s of millions of dollars or more.
 

Once training is complete and the model is put into production the core model cannot be updated incrementally in real time — it is read-only! This means that any changes in the real world (like elections, conflicts, scientific breakthroughs, etc.) since the beginning of training cannot be part of the model. This is a very serious limitation. One can hardly consider a system to be intelligent if it cannot adjust its core knowledge given changing circumstances.

 
Not only does pre-training makes these models incredibly expensive, but worse, they are inherently disposable. They get used for a few months and are then discarded, forcing developers and users to constantly re-engineer solutions. What does ROI look like on billion dollar investments depreciated over just several months?
Attempts to mitigate this flaw fall into 3 basic categories: Fine-tuning the model, using large input buffers and complex prompts, and incorporating external databases. Each of these methods has their own limitations and do not fundamentally address the core problem.
 
Quantity — The amount of training data, and thus model size, has been the single biggest factor driving the utility of current AI systems. LLMs now consume 10s of trillions of words for training. Obtaining such huge amounts of text means scouring the Internet for whatever one can find — good, bad, and ugly. LLM ‘hallucinations’ are partly the result of this data promiscuity.
 
Ultimately this is not the best way to create robust intelligence. In human education we try to take care to teach valid, non-contradictory knowledge. Real AGI should focus on data quality rather than quantity.
 
Statistical — Most current AI systems are based on encoding statistical regularities found in the training data. Modern designs encode word (or other data) relationships that help to statistically predict the next item. The statistical (randomized) nature of prediction is the main reason for hallucinations. Once a wrong token choice is made the system will try to ‘double down’ on completing a plausible, but wrong sequence.
 
Furthermore, because these systems did not form their models by interacting with the real world via senses and actuators their knowledge representation is not properly grounded. Concepts are formed by word or token co-occurrence, rather than by ontological (real world) features. Ontological concepts, unlike statistical ones, are represented by a dynamically variable number of attributes (or meaningful vector dimensions) that increase as you learn more details about a given concept.
 
Theory — Finally, the overarching sin is the lack of theory. From first principles one would expect AGI researchers to start with trying to truly understand human intelligence in order to build it. Not so. The current approach is essentially ‘Hey, we’ve got a lot of data and compute, what can we do with it?’ That’s the hammer we’ve got, what nails can we find.
 
You know that there’s something seriously amiss with this brute-force approach when you consider that the human brain uses about 20 watts, not 20 gigawatts, and we need just a few millions words to master language and reasoning. Not tens of trillions.
 

A much more reasonable approach would be to first deeply explore and understand the essentials of intelligence and how the human mind works – how we learn and reason. Such research would cover epistemology (theory of knowledge), the nature of knowledge and how we obtain certainty. More broadly, a philosophical understanding of consciousness, volition, and ethics (how we know wrong from right). It would include a thorough grasp of cognitive psychology: How our intelligence differs from other animals’. How children learn most effectively. What (good) IQ tests measure.

Without a good theory of human intelligence AGI will at best be stumbled across by accident. By their own admission, none of the current AI leaders have a theory or plan on how to achieve AGI.

The Seven Sins — Almost all current AI projects commit all or most of these sins. We will not achieve AGI without taking them seriously. One very different approach is what DARPA calls the ‘Third Wave of AI’, or Cognitive AI. It is a much smarter path to AGI. It requires orders of magnitude less data and compute, and has the ability to learn incrementally in real-time. An example of this is Aigo.ai’s Integrated Neuro-Symbolic Architecture.

 
Instead of trying to build longer ladders and larger airships we should be focusing on fundamentally better solutions. And to avoid sinning.