What’s next for LLMs in 2025?
Okay, so we’ve seen that LLMs, and the AI they power has made huge progress over the last two years. However, they’re facing a number of challenges.
It’s my contention that these challenges will be solved – but LLMs (and the broader AI industry) are going to look very different by the end of this year.
The data solution
How are AI companies going to solve the data issue facing LLMs, then? There are a number of possible solutions.
The first is through simple brute force capitalism.
OpenAI has just completed the largest venture capital deal of all time.
In its latest funding round, the generative AI progenitor raised $6.6 billion at a valuation of $157 billion. That valuation makes OpenAI the third most valuable VC-backed company in the world, surpassed only by SpaceX and ByteDance (the owner of TikTok).
OpenAI now has the funds to buy any data it wants. In theory, the company is now in a theoretical position to buy the entire back catalogue of Penguin Random House or the rights to tens of thousands of movies. This isn’t mere idle conjecture, either. Earlier this year Meta (the parent company of Facebook and Instagram) briefly mulled over purchasing Simon & Schuster.
In short, AI companies may solve their current dearth of data with cold, hard cash.
I won’t discount that these companies may also solve the Habsburg AI problem, but in the near term I suspect they’ll prefer to just use their capital to acquire new data sources.
The domain-specific LLM solution
Okay, so we are onto the crux of this article. There is an emerging type of LLM that can potentially leapfrog the current obstacles in the way of LLMs.
Domain-specific LLMs.
Domain-specific LLMs are language models which have been trained in a certain knowledge domain.
They are explicitly designed to excel within certain areas of expertise. Ask them the sort of more general questions that you would have on ChatGPT4, and they’ll likely fail. But, ask them a question in the domain in which they have been trained, and you’ll get a superb answer.
Domain-specific LLMs are already extant and being used ‘in the wild’. Examples include:
- ClimateBERT – an LLM that is trained on climate-related information.
- Med-PaLM2 – an LLM that is designed to provide medical diagnoses.
- BloombergGPT – an LLM that has been created and trained for financial forecasting purposes.
So, why do I think that domain-specific LLMs represent the future of LLMs more generally? It boils down to a couple of points.
Firstly, domain-specific LLMs have significantly lower development costs than the behemoth foundational LLMs of OpenAI and others.
Domain-specific LLMs typically start life as small language models that are then expanded and refined through fine-tuning LLM techniques. This results in LLMs that are far more accurate and efficient in relation to specific knowledge domains.
In other words, instead of trying to consume an entire Internet worth of data, domain-specific LLMs focus instead on smaller, domain-specific datasets – meaning they are cheaper (in energy terms) in both the training and inference stages.
To put it bluntly; domain-specific LLMs are likely to be much more profitable (at least in their inference stage).
Aside from profit, domain-specific LLM’s have the potential to be more effective at their given tasks. As a study3 from earlier this year pointed out, foundational LLMs are just too big to be effective for targeted applications:
‘However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g. various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications)’.
From a marketing perspective, foundational LLMs have already made a number of cultural faux pas. Does anyone remember Gemini’s image generation controversy from earlier this year?
Domain-specific LLMs will allow marketers to create AI applications that better take into account the cultural nuances that the larger LLMs miss.
Sure, they lack some of the magic of generic LLMs – you can’t expect a domain-specific LLM to answer a question on any topic whatsoever – but, when correctly trained and optimised they can be incredibly effective. Think of them as ‘subject matter experts’ rather than savants.
In some senses, this evolution parallels the development of human knowledge. In the Classical era through to the Renaissance, emphasis was placed on being a multipotentialite – an individual who could excel in myriad fields of knowledge.
However, as the world globalised from the 19th century onwards and Ricardian economics became the presumptive trading model, emphasis was placed instead on specialisation. Individuals would prosper economically by becoming highly specialised in certain fields of knowledge.
As this story makes clear, we effectively went from human foundational LLMs to human domain-specific LLMs.
It makes economic sense for the AI industry to follow this path in the years ahead.
Note – we are already seeing this domain-specific trend emerge. Take Consensus, for example. It is an AI-powered search engine that focuses only on scientific subjects.
Idea – if you want to gain a competitive advantage in AI, companies should be focusing on creating domain-specific LLMs for niche, cash-rich, knowledge domains. It certainly seems to be an approach that Google is endorsing.
The special–purpose technology solution
To date, AI platforms have relied on costly and energy-hungry GPUs to train and run their LLMs. In fact, the biggest AI players have been spending colossal sums on these GPUs (and their associated data-centre infrastructure).
In its last earnings call Alphabet, the parent company of Google, announced it had spent $50.6 billion on infrastructure in the last quarter – up from $30.6 billion during the same quarter last year.
The likes of Alphabet are cash-rich – but such jumps in infrastructure spending will be noticed by even the biggest of computing giants.
Not only does all this infrastructure cost vast gobs of money – but, it appears as though the industry is starting to see something of a ceiling on GPU performance in relation to LLMs.
As the size of LLMs has increased, memory access bottlenecks have appeared. In other words, LLMs are becoming so big that even the latest GPUs are struggling to handle them.
Again, there is also the question of energy. Blackwell, NVIDIA’s latest GPU, runs five times faster than its predecessor (helping to alleviate the memory access bottleneck problem), but uses 70% more power in the process.
What’s the solution? AI-specific computing hardware.
GPUs were never designed for AI (at least until recently). They existed to make your video games look fantastic or to help creatives edit stunning visuals or videos. Using them to run LLMs came about because there wasn’t really any alternative technology.
That’s all changing. Google and others are developing processors that are explicitly built to run LLMs.
Let’s have a look at one of the earliest, and most prominent, examples; Google’s Tensor Processing Unit (TPU).
The MIT describes how this TPU architecture works:
‘The TPU contains thousands of multiply-and-add units directly connected in a giant grid. The TPU loads the data from external memory into its grid, where it flows through in regular waves, similar to how a heart pumps blood. After each multiplication the results are passed to the next unit. By reusing data from previous steps, the TPU reduces the need to access the off-chip memory. TPUs are a type of ‘domain-specific’ (DSA), processors that are hard-wired for one purpose’.
Tensor is not the only AI-specific hardware on the market. We are now also seeing the rise of NPUs (neural processing units). As the name suggests, these are chips that are designed to accelerate AI processing. In an interesting development, NPUs can be either standalone chips, or integrated into CPUs or GPUs.
The good news is that AI-specific processing hardware (be it Tensor or an NPU) seems to be not only faster, but also more energy efficient – particularly when compared to generalist GPUs. This will certainly go some way to alleviating AI’s energy quandary.
Other good news is that progress on this front is rapid. We are already seeing laptops coming to market which feature NPUs, facilitating on-board, locally-installed LLMs. Lenovo appears to be one of the quickest actors in this regard, with laptops featuring integrated NPU chips married to powerful, yet efficient, GPUs and CPUs.
Idea – if you’re upgrading your computer hardware, make sure you future-proof it by investing in machines that feature AI-specific processors.