The Artifacts of Language Models

Published in

Samuel Synthesized

4 min readFeb 2, 2021

I’ve been reading The Almanack of Naval Ravikant and have been really enjoying Naval’s perspectives on things. In thinking about how the author put this work together, I’ve just been thinking a lot about the artifacts that Naval has left and how useful they have been to learn from. As a very private person, I’ve been thinking about the artifacts in my life that are visible and their impact. The code I write, the tweets I like, and the photos I’ve shared are all a glimpse of me. But how useful are they when trying to reconstruct where I was in a moment of time? How rich is the data I leave behind? Should I leave more?

Artifacts are “something characteristic of or resulting from a particular institution, period, trend, or individual” and I generally am referring to the text/posts we leave on the internet in this post.

In thinking about language models and how they are currently trained — by crawling the internet for and collecting a large corpus of “high-quality” text like articles, papers, and blog posts and predicting the next word for all of this data — language models learn from the artifacts that humans have left behind. For example, the recent hype about GameStop ($GME) with r/wallstreetbets has left artifacts about the communities collective thought process on selecting $GME. Their goal wasn’t to document their thought process, but the documentation was a byproduct of their communication on Reddit. It’s very rich with the culture and values of that particular community, and if you were to fine tune a language model to be able to post in that community in a convincing way — using the same lexicon, average post length, sentiment, etc. — the details are there for a sufficiently decent language model to do so.

What are the artifacts generated by language models and what can they tell us?

As a field, we’ve identified the artifacts generated by humans and great ways to use them for training language models (LMs). But what are the artifacts generated by LMs and what can they tell us? I’ve been thinking about this questions in two different domains, represented by the following questions:

What artifacts can you create during experimentation that give you good insight into how well your language model “understands” language (beyond cross entropy loss and perplexity)?
Can you determine if text is a human artifact or the result of a language model?

The first question is one that is more immediately relevant to my research and is a current topic of exploration for me; I plan to address my findings here at a later time.

The latter question is one that has been bothering me more recently, especially when considering the implications of the powerful generative models that transformers enable, especially when scaled. Consider this example from r/wallstreetbets.

Here someone is using a simple bot to post the same content over an over again. As a human, you can pretty easily identify that this is a bot because the artifacts make it obvious: the same text repeated, probably by the same user. But what happens when each of these posts begin to vary, perhaps when generated by a LM? Our human ability to discern whether something is an artifact created by a bot will rapidly be outpaced by the ability of LMs.

The issue of sock-puppet accounts (accounts created by a single actor to represent a particular perspective) already exists, but requires a particular level of human effort to scale. Using LMs to allow these to easily scale present new problems and we need new solutions to handle this. In my opinion, we need to consider generative models from an adversarial perspective and create the capability to determine if content is generated from a human, an AI model or a combination of the two. Assessing an artifact and determining if that artifact is created by a person or a model is a very hard problem. This is a current problem about to get worse on social networks like twitter and reddit, where a smaller actor can amplify a perspective. These platforms are democratic (upvotes and retweets are seen more) and this type of problem is particularly dangerous to democratic systems where representation of a perspective is power.

Using deep learning and AI tools give individual asymmetric power, but that power must be balanced by developing adjacent capabilities in a reasonable timeframe. If we are developing the ability to speak in a new way, we must also develop the ability to hear in a new way, so that we don’t create a world were we lose the signal in the noise. Being able to preserve our collective ability to identify humanity in the artifacts entering our shared spaces is important to preserving human ability to be able to navigate that space in a safe way. Perhaps alternatives like a digital identify and proof-of-humanity (probably something better than proof-of-nipple lol) can signal that an artifact is created by a human with some stake in what they are communicating.

The Artifacts of Language Models

Written by Samuel Gbafa