Uncategorized

LLM architectures in production – winter ’24 edition

Dec 4, 2024

•

4 min read
As far as I can tell, state of the art generative AI architectures all follow more of less the same pattern: essentially, everything is in service of crafting the perfect question to the LLM, which we ungenerously think of as some kind of scatter-brained oracle that can only consider so many things at once.

If you ever listened to Dan Carlin’s podcast you know that one can never have enough context; however this is not true for LLMs: not only adding more context is not guaranteed to make things better, but there are also hard limits. Current LLMs all use the transformer architecture, at the core of which is the now famous self-attention mechanism, which computes weights for all prior words in the sequence thus far for each new word a transformer generates. This means that the time of inference scales quadratically with the length of the prompt (though modern optimizations like sparse attention, rotary positional embeddings and sliding window mechanisms reduce this overhead in specific scenarios).

You can get around this limitation by fine-tuning your model, to bake the context in it. This latter method is expensive and more suitable for when you have a large, semantically-meaningful dataset (e.g. patents, the law, medicine books) where you are essentially teaching the model a new dialect. Anecdotally, it takes about 10 occurrences of a piece of information in the training data set before the model will learn it. [1] Fine-tuning can be highly effective to direct LLMs in highly specialized cases. [5]

In-context learning (which is a fancy way of saying “just stuff it all in the prompt”) by comparison, while limited by the context window size (typically 4-32k tokens; the max right now is in the 100-200k range, with GPT4-turbo and llama 3.2 at 128k tokens, and Yi-34B and Claude 3.5 at 200k – this is equivalent to about 100 pages of English at 250 words/page) allows for immediate input update without retraining, quick prototyping, easy deployment and helps mitigate the hallucinations problem – “This pattern effectively reduces an AI problem to a data engineering problem” [1]

Precisely because the context window is currently limited to a precious few pages of text, the name of the game is to essentially craft the best prompts possible [3] [6]

There are basically two parts to doing this:
- An ingestion path, which is responsible for creating and managing the source data that the system will later consume
  - I think of this as building a good library, with indices and references
  - The main building blocks are: document sources, a document processing pipeline and a storage layer.
  - In the simplest form, the storage layer can just be a folder with a bunch of text documents and a good file naming scheme, because every data retrieval solution is ultimately a more sophisticated version of this.
    - In reality there are a few distinct data types we care about here: raw data, metadata and embeddings.
- A request fulfillment path which uses the above data to prepare and construct the generation context for the LLM, runs said generation, does validation and returns. Building blocks are:
  - A prompt augmentation pipeline which I think of as a good librarian that can retrieve the exact pages/paragraphs you need.
    - This is an especially apt metaphor for vector stores retrieval, where we use the embeddings model to convert the user query into embeddings, then using similarity search to find the most relevant document chunks.
  - An LLM layer
    - Guardrail (rules and filters that constrain LLM outputs to prevent undesirable responses and ensuring outputs align with intended use cases and safety requirements.)
    - Quality (automated evaluation, drift monitoring, metrics for accuracy, safety and bias)
    - LLM Ops (monitoring, versioning, testing, security, and performance optimization like caching)
  - An API layer to expose all the above to the outside world
Key architectural tensions center around embedding model selection (local vs. hosted), vector store scalability, prompt engineering automation, the balance between real-time and batch processing for document ingestion, and maintaining consistency between document stores and their vector representations in production

As for storage, while local-first [9] architectures using browser storage and WebAssembly are feasible for simple applications, it is my opinion that enterprise-grade systems that require precise data lineage, versioning, and attribution will necessitate of a server-side architecture with proper database management systems to maintain referential integrity and temporal consistency of source materials.

There is also a developing notion of “edge AI” [8] which is more about pushing model and inference on the device. A hybrid approach could also be adopted where sensitive data is kept locally, and only the relevant bits are sent to the model for processing [12]
On contemporary conservation

Sep 5, 2024

•

8 min read

On a recent visit to Europe, I had the opportunity to engage in the tourist-type-of-activities which often escape my short and utilitarian visits. It was interesting to me what stood out to my now-americanized eyes – that is to say, after a dozen or so years spent in a country where I am not often and not really confronted with the complexities of dealing with the legacy of a long history.

How long is the past? Growing up in Europe did promote within me a notion that the tangible past is at least couple thousands years old. Maybe not quite bronze-age-old, but when the streets you travel on every day were named by the literal Romans, and walking into buildings built over a thousand years ago barely registers as even a curiosity – that does something to you.

Namely, it makes you feel emotionally connected to the past: battles fought, centuries-long rivalries, arcane political power-plays or the lives of obscure men feel like they actually matter. The thought of occupying the same physical space as people whose only real characteristic is that they existed a long time ago also may make you feel like you are taking part in a grand whole.

And while lousy, this generic feeling of attachment is sufficient to underpin the perpetuation of eternal rivalries as silly (I am told Italians are supposed to dislike the French, for example) as they are toxic and foundational to the continental character. A storied legacy is not the same as the study of a long history.

I found this kind of mindset to be almost wholly absent in the US – yes, at great cost and despite their best effort, but that’s besides the point here: all I am saying is that the root system doesn’t go quite as deep, and that this slight change in perspective accounts for most of my observations below.

So as I was visiting Saint-Jacques tower in Paris and the guide was explaining its tortuous history, my attention was drawn by the room dedicated to the details of its most recent restoration. I found it interesting that a process meant to enhance an exhibit became itself a part of the exhibit – a pattern that repeated itself in a great many of the churches, buildings, sites and museums I visited.

This, alongside a certain narrative that positions contemporary conservation as an essentially neutral act (“alter the form but not the essence”), shrouded with the language of science in an attempt to characterize a political choice as an objective and technical process (not unlike contemporary capital punishment) – made me reconsider the nature of this whole interaction and of the artifacts themselves.

It is entirely obvious that the decision of what artifacts to conserve, ignore and restore is political; sticking to French history, I would argue that there isn’t in fact any substantial difference between burning a painting of the King during the revolution versus restoring Notre-Dame after it burned down in 2019: both are politically-motivated, culturally-significant acts of engaging with an historically significant artifact.

The Tower of Saint-Jacques offers an excellent example of the ebbs and flows that might bless and vex an artifact: originally part of the gothic-style Church of Saint-Jacques-la-Boucherie, which was built by the professional order of butchers in the early 16th century and then largely demolished during the French Revolution, the tower somehow was preserved, albeit in bad shape. It then got restored during the Second Empire gothic revival phase. The political forces at play are clear to see.

Then, crucially, it got restored again in 2009. This time around the stated motivation wasn’t, say, “reconciling with the medieval past and assert a sense of historical continuity and pride amid political modernization” like it could have been in 1855. The stated motivation was technical in nature: structural issues, cracks, cleaning up the stone from pollution – all in respect of the architectural integrity of the building. Ostensibly, a neutral act.

Respect of the original, reversibility, integrity – all tenets of modern conservation that somehow suggest a kind of restoration structurally different from those of a past in turmoil. The passage of time and styles are for all to see in the walls of buildings that got built and re-built over centuries (Mount St. Michel is a great example from this trip, but also St Peter’s in Rome or the cathedral of Seville come to mind), carrying layer of styles like bedrock carries the scars of geological eras – but no more. For some of these buildings, it feels like history is over and they shall remain frozen in time.

This is certainly not a general statement, as examples of unafraid engagement with our historical artifacts abound: the Louvre pyramid, the Calatrava bridge in Venice and the St. Paul’s Millennium bridge in London, the new Reichstag in Berlin, etc. It is also not a new attitude: the very reason we have historical artifacts is precisely that people have been preserving them since the dawn of time (to support narratives, creation of in-groups, etc). So you could argue that this is nothing new.

Yet, I disagree. I think the implicit claim of objectivity implied in modern restorations highlights something more subtle than simply some run-of-the-mill political hypocrisy. To me, it’s all about technology.

I can only speak to my experience, but I have a really hard time relating to the life of a 15th century peasant (the target audience for a lot of this art); I don’t mean conceptualizing it, I mean really empathizing with it. I know too much. My brain has been bombarded with orders of magnitude more units of information than anyone even a hundred years ago had even access to. I take it for granted that the earth is round and that the same force that I succumb to due to my poor motor skills is the same that keeps satellites and planets in orbit. I know that computers can be made. My life experience is radically different, too: bubonic plague is a non-concern, and neither is food supply. There is always electricity. I have in the last few weeks traveled more than the average King would have in a lifetime, only a short couple hundred years ago. Living in our times also does something to you.

All this is not to say that the nature of the human condition has fundamentally changed, far from it. Art production has not stopped, only expanded. I find it as easy to empathize with the very human plights of love and loss of Virgil just as much as watching “Inside out”, or laugh at Lisistrata as much as standup comedy, or relate current political events to Shakespearian plays (isn’t Biden probably confronting very similar feelings as King Lear was? Wasn’t his decision to step down ultimately motivated by not wanting to be another Richard II?). These things do not change.

But for art works and artifacts that are more situational and less intellectual, which relate to life circumstances (buildings) or political climate (propaganda), my current frame of reference feels far more removed than – I suppose – it would have felt had I lived in the 18th century and was looking back two hundred or so years. In a bit of a crass oversimplification, I credit technological progress for most of this hiatus.

This other-ing creates a certain awkwardness: how am I supposed to relate to a worldview that is entirely un-relatable to me, or that I know for a fact it is not true? I believe the algid, impersonal, a-chronic style of conservation outlined above is an attempt at addressing this very question by not really answering it.

But other-ing has also some material advantages, such as allowing for the notion that we have departed and overcome this older, harsher, more brutal world and are now modern men and women, protective of this unusual stretch of peace we are holding on to, delegating war to robots and abstractions, living longer than ever before, participating in a collective society and a small world, and denying death at every turn. We don’t really do revolutions like they did back then! We don’t really do wars like even our grandads did!

Except of course we do – but in a profoundly different information environment, with tools at our disposal that would have been unthinkable to a middle-ages warlord. But we still have kings, we still have revolutions, we still have wars. The context of it all has changed so much that it makes me (and maybe us) feel like it’s a different thing entirely.

Surely this is a highly partial viewpoint too: my life experience is also not representative of anything other than that of a white dude who grew up in the west. I don’t purport to be anything else. In fact, I would welcome a differing viewpoint (comments are open).

Western philosophy has graced us with the notion of hermeneutics, as an attempt to answer this question. My feeling is that the problem runs much deeper: the existence itself of a discipline concerned with the question of how do we relate to our past is indicative of a degree of societal consciousness and sophistication that in itself renders that very past “other” – until the next revolution or war.
Ficcional quotes

Aug 20, 2024

•

3 min read

Part of the fun of reading Borges is peeling the layers: structural, narrative, symbolic, no matter – there’s a rabbit hole waiting in earnest. One such type of rabbit hole consists in trying to follow his quotes, whether real or made up.

In some cases they are obviously made up, and the clue is literally in the title: “The Approach to Al-Mu’tasim” (the story) is a discussion of “The Approach to Al-Mu’tasim” (a fictional story), set up with such surgical distance to almost remind of the quadruple framing of MP Shiel’s “The Purple Cloud”. Similarly, the Don Quixote written by Pierre Menard, the work of Herbert Quain and the infinite volumes of the Babylon Library are declaredly fictional entities.

Erudite literary plot devices needn’t be in the title to provide fundamental support to a story (“The Enemies” by Jaromir Hladik), mild support (the “AngloAmerican Cyclopaedia” of “Tlön, Uqbar, Urbis Tertius”) or minor support (the Gaelic translation of Shakespeare’s “Julius Ceasar” mentioned in “Theme of the traitor and the hero”) and those are found so often and so profusely throughout the book to barely register.

Sticking to “The Secret Miracle”, in a mildly more interesting narrative contraption it is revealed that Jaromir published a fictional translation of the Sefer Yetzirah for the very real Hermann Barsdorf publishing house (which while no longer in existence, most famously published Freud’s books).

Borges revels in this type of tomfoolery. For example, in “Death and the Compass” we learn that the just-died Tetrarch of Galilee had authored a translation of the Sepher Yezirah himself, as well as a “Vindication of the cabala”

Hladik had apparently also written something by a similar title.

Unlike for the character in “Death and the Compass”, in this case Borges ventured into a much more interesting exercise, of actually discussing the contents of a fictional book – stuffing entire worlds into just a few, dense sentences.

The entirety of “Tlön” is this, but sticking to strict literary quotes:

Or this gem from “Menard”

Finally, a less creative but just as evocative and effective trick is to accuse a fragment to be made up.

[…] afterwards, in the enormous dialogue of that night, i learned that they made up the first paragraph of the twenty-fourth chapter of the seventh book of the “Historia Naturalis”. The subject of this chapter is memory; the last words are “ut nihil non iisedm verbis redderetus auditum”

This fragment very much exists, but there is of course no indication that it is made up.

Conversely, sometimes a quote is so rarefied that even if it they did exist, they are so hard to attributed that they might as well be made up. For example the “Robertson” quote here conceivably refers to Frederick William Robertson (a reverend known for his sermons).

Of the surviving, internet-accessible bibliography, I could find nothing that really substantiated this quote. Maybe he made it up, maybe it was lost, maybe there is no difference. “My solitude rejoices in this elegant hope”.

Uncategorized

LLM architectures in production – winter ’24 edition

On contemporary conservation

Ficcional quotes