Machine Learning

State of AI (LLM): Dec 2023

2023 was an amazing year for LLMs. Here is what is going on what it might mean for the world.

Matt Stone

31 Dec 2023 — 7 min read

Intro

I feel compelled to date this article because the field is evolving so rapidly that any of these perspectives can quickly become dated. At the end of the year in 2023, here are some thoughts I have about the state of AI (LLMs more specifically). This could be soon outdated by some new advance.

Technology

The two seminal moments of understanding LLMs for me came around GPT-3. 1) the model could generate reasonable text (calling to mind the insult of “this essay looks like it was generated by GPT-3 or its human equivalent”) Side note: as models progress, that insult seems less insulting. 2) the scaling laws of LLMs. The last 3 years have followed from those two key truths.

It’s hard to predict the outcomes early in technological disruption. There could easily be some paper about another seminal moment in this space, akin to the two above which would require a large rethink. Humans are bad at predicting exponential progress, and even worse when the direction is unknown.

Consumer Fads and Trends

One challenge in prediction, especially on the consumer side, is to understand where we may have the beginnings of a system solution. Many fads, such as Lensa and other similar tools, show how quickly these apps can fall out of use. I remember thinking in 2018 or so when Facebook added a style transfer photo feature that it was the coolest thing. After a couple of weeks, I stopped using it after the novelty wore off. A friend told me that was the pattern across app usage for that feature. In any fast moving space, what are the fads vs what has staying power? I think the reason OpenAI can command a valuation close to 90B in a 7% interest rate environment is that it seems like ChatGPT cleared the fad stage.

For the rest, especially on the consumer side, I don’t see a clear one. Maybe the character.ai approach and other AI companions may have some staying power. It’s possible the long-term consumer version looks very different than anything we have today. Perhaps the biggest impact will be in gaming.

Business and SAAS

The B2B market is where you see a lot of effort expended by many of the foundation model labs as well as the large cloud providers. Business buyers are a different breed: they look for value exchange and are often much more “rational” about purchasing, although they are subject to different incentives.

This is the big copilot bet Microsoft is taking. The idea is to integrate all of the Microsoft services data (Outlook, Word, Excel, etc) as inputs for one agent. The key to AI in the business channel is that it has to deliver a ton of value, and that at some level, it needs to be rigorously measured. I find that rigorous measurement is a deeply undervalued skillset and mindset. Some estimate of the productivity gains will be the reason a company would pay $30/seat (or more if the costs are high) for such a service.

What is the Value

So this leads to the key question: what is the value of using an LLM? I think it comes primarily from knowing what’s likely in a context (the basis of the transformer-next-token approach). That means for your use case, you need to understand where returning a high-likelihood-solution can be useful.

For example, in the context of general medical symptoms, a fine-tuned LLM-based system can return the most likely reasonable diagnosis. In legal discovery, it can return the items most likely to be flagged in similar situations. In creating marketing copy, it can create some reasonably compelling description of the product. And so on. It seems as though there’s some evidence that in the current stage, AI-based tools help the untrained/low-skilled workers catch up to the median worker. But they do little to help the best in their field – as you would expect, the transcendent performance is much better than “reasonable.” A simple analogy of this is how YouTube allows people to do a lot more DIY projects at much higher quality.

Additionally, I think you need to treat hallucinations as a feature, not a bug. This means that the key skill to best work with these tools is one of discernment. At a fundamental level, you need to be able to tell if the LLM gave you a good answer or not. For many tasks, a reasonable answer will be fine and so many businesses should automate those parts (or add a discerning human in the loop). That frees them up to focus on the core set of tasks where they can add the most value. Thinking in systems and identifying the new skills and process bottlenecks will be how to reinvent your organization to get the most value out of LLMs.

Moats

So it seems very clear to me that LLMs will generate a lot of value for companies and consumers. However, it’s far less obvious that the industry of generative and serving LLMs will capture much of that value. I am already observing incredible commoditization and that open source capabilities are often likely sufficient for foundation model capabilities. Disclaimer: GPT-5, Gemini V2, Claude 3, etc can of course offer a step change that rebuilds a moat.

For now, I don’t believe I have seen any lasting moat. It’s just too easy (despite the CPU/GPU/etc costs) to train a foundation model that builds a moat for a business. I do think the consumer headspace of ChatGPT is one such moat, but even there they don’t seem to be running at a profit. There’s too much cost pressure. Nat Friedman characterizes the space as one of preference rather than any strong quality differentiator.

A stronger moat is a full-service lock-in for a large cloud client. There I think the largest tech companies are better positioned to capture the value. This of course argues that in its current stage, AI may be more of a sustaining innovation. We’re still missing the true system solution. However, it’ll probably take the form of something that removes significant friction from the application of LLMs. A chatbot is not a great interface, and certainly not one I enjoy having to open multiple times a day. This is probably why Sam Altman is famously looking into hardware to build a moat.

I’m personally very wary of any startup that aims to capture “a lot of the value” in this space. A few might, but there’s way more VC money (raised when rates were low) than there are true opportunities. If you can build a wrapper on top of a LLM in a weekend, keep in mind that so can others. Moats do matter for viability, and I’m not sold on “being there first” as the only answer. If you feel differently, I’d be very curious to hear what I'm overlooking.

Future: Technical

Inspired largely by Andrej Karpathy’s excellent talk, I think the upcoming technical trends will take place along the following dimensions:

Scale – continue to scale up the models (the scaling laws still seem to work?) although this is getting prohibitively expensive to do so.
Data – the example of Mistral seems to indicate that for a given task, a small number of particular cogent examples can go a long way, meaning that dataset quality will really matter for a task.
Planning – the idea of system 2 thinking from the talk, where agents can search across trees and return the best option. The biggest problem is just how much more exponentially expensive this becomes, where I wouldn’t be surprised if the cost of running this is easily measured in dollars with current infrastructure. In the early stages, if these have a low hit rate, the inference cost may be too much. That said, like business consultancies, perhaps there’s a role for caching (case studies) and retrieval (expert consultants) to approximate many of the stages.
Agent orchestration – like the GPT store, the idea is to have specialized LLMs work together. This also suffers from some of the cost issues as planning, although less extreme. There’s also some evidence (like chain-of-thought) that some LLMs (or even classification models) can guide better generation. This might be a fast way to implement new capabilities without needing to train new models.
Interpretability – Anthropic has a fascinating paper on monosemanticity, which I think can help to elucidate the proper training paradigm to reduce hallucinations or increase reliability.

There are other interesting technical directions, but those are the short list of areas I’ll try to keep my eye on in the open space and publication realms. The technical reason I believe that transformers work so well is that they are a generalized, scalable way of adding conditional (Bayesian) information. Anything that increases that capability will be something I’m deeply interested in understanding.

Future: General

The more general implications of the advances in AI/LLMs are also of fundamental importance. There are a ton of books, articles, podcasts, and so on where many brilliant people are ruminating on these ideas. I’ll share three salient points here, though I’m sure to be leaving out many important ones:

Preparedness: it’s not clear we’re ready for a technology of this magnitude. The Coming Wave is the best articulation of this upcoming seismic shift and how we can think about it.
The future of the labor market – I don’t know how this will disrupt jobs of the future. I’m not sure what to advise the younger generations for their careers. One recent talk at Harvard puts the “end date” of computer science as a discipline at 2030. While this may not be literal, it’s in the right direction about the magnitude of disruption. I’m not sure how to future-proof a career with the level of disruption coming. For now, I think the best trait to cultivate is a deep curiosity about the world and a desire to keep learning.
The future of education – I fully expect some large scale disruption to the nature of education. The scaled, prepare-for-factory-life mass education systems do not do nearly as well as they could in educating the next generations. Advances in the capabilities of “AI tutors” coupled with a deep understanding of individual motivation and psychology can revolutionize the process of learning, and may solve the curiosity challenge above.

The AI revolution is upon us. Life will be dramatically different in 30 years. I hope we get there intact.

Additional resources:

Stratechery interviews with Daniel Gross / Nat Friedman – Dec 2023, Aug 2023, Mar 2023, Dec 2022, Oct 2022 – I really recommend you subscribe to Stratechery if these topics are interesting to you
Benedict Evans State of AI Dec 2023
Economist: many articles in 2023
If Computer Science Is Doomed, What Comes Next? - The New Stack

State of AI (LLM): Dec 2023

Matt Stone

Intro

Technology

Consumer Fads and Trends

Business and SAAS

What is the Value

Moats

Future: Technical

Future: General

Read more

Book Review: Metabolical

Book Review: No Rules Rules

Adverse Optimization: The Anxious Generation

Professional Change: The First 90 Days