On LLMs

Published on May 16, 2025

I like Gary Bernhardt. In fact, he might be my idol. He had original ideas and presented them very well. I learned a lot from his teachings.

Watch this 2012 talk please, it is well worth it.

In short, he listed several technologies that upon their release either “allowed saying new things” (capable) or “made innovations accessible for the industry” (suitable). According to him, LISP, C++, Ruby innovated while Fortran, SQL, Java standardized and simplified.

I do not necessarily agree with the categorization, for example, SQL subjectively seems much more capable than flat files to me. However, reactions to these technologies/techniques are pertinent.

To the contemporary, capable systems typically seemed too slow or messy. In contrast, suitable systems seemed too limiting.

The talk was given at a Ruby conference, and ended with a warning that “something slower than Ruby will show up one day“.

Well, Large Language Models showed up. They are not programming languages (or are they?) but in some cases they can still “be slower to code with” than the traditional method. Still, LLMs allow us to say things in a new way, are progressing like crazy, and are a hot mess. Everyone has an opinion, and those opinions are rather polarized; either AI is the greatest invention since the Internet or they can’t be used to do anything serious (or worse, NSFW). You should recognize how the latter phrase aligns with capable systems from Gary’s talk.

According to his model a suitability response should follow. What will it be?

My thoughts below are completely uninformed and will likely age badly. Let’s laugh at them in 10 years.

The elephant in the room when considering LLMs by serious people for serious business: hallucinations. If you use Copilot code is practically free. Time to fire programmers? Well, besides writing specifications the programmer also has to review the code with 3x as much scrutiny; her artificial intern produces good code and nonsense with the same level of confidence. I hope you enjoyed writing docs and reviewing PRs before; that’s most of your work now.

A suitable system has to improve reliability. How? A billion dollar question. Let’s consider the kind of improvements that happened during the C++ to Java or flat files to SQL transitions.

Could we structure tokens to limit LLM output to syntactically valid programs? An LLM would only be capable of producing code that compiles. This is not a semantic correctness guarantee but a step in the right direction.

Could we invent new programming languages that are more suitable for LLMs to produce? Simpler, more regular syntax. Not necessarily English or even text based. A human would still review the result but they could use a different, human-tailored representation, textual or graphical.

Could we peel layers of abstraction off? Having a driver, the TCP stack, C API, language bindings, and a library on top is nice for a programmer who needs to HTTP POST. LLMs have no problem absorbing more things (complex interfaces) but—in my opinion—would prefer working with less leaky abstractions.

Could we double down on validation?

Property based testing? LLMs can help define invariants.

Using other LLMs as QA? They can click around all day at virtually no cost, and will never get bored or complain.

Formal verification? Impractical for humans in move-fast-and-break-things domains but how about LLMs? Business process modelling is a similar idea but at a business level.

Perhaps some combination of all of the above?

Disclosure: LLMs were not used to write this post.