What is AI – For the Lay Man

Seriously, what the heck is AI? Because no one seems to know. The news talks about AI changing the world, and its wants and feelings. I hear people talk about the dumb mistakes AI makes and tell stories about how it will take over our jobs. I’ve spent the last few years creating AI models, implementing AI tools, and generally being nerdy about it. Let me give you what I’ve learned and tell you what AI is…and what it isn’t.

Surely no one knows?

I’m not a genius, but this is enough in my wheelhouse that I can tell you it’s not as complex as you think. It’s not complex like a human brain, more like a pile of rice. If you look at a pile of rice you couldn’t tell me how many there are, how much they vary from grain to grain, or even what direction they’re all individually oriented. But it’s all just rice.

Ok then, what is AI?

Glad you asked, I’ll start off with Large Language Models (LLM). Those are the things we’ve been playing with like ChatGPT and Google’s answer section. Let’s go basic, how do we make one?

Let’s start with a page that has 128 words on it. Let’s create a unique number for each one of those words and convert the page into numbers. Then let’s measure the relationship between those numbers like:

  • This one often comes after that one.
  • That one is three away from this one and five away from another one.
  • This number is used a lot and that one isn’t.

We can use all these relationships to create a chart of probabilities. So if I have a number, what are the odds another number will come next? Or if I have a number what’s the most likely number to come next. So if you take that chart of probabilities we can create what’s called a Statistical Model.

Ugh, math.

No worries, this is all conceptual. We’ll have the computers do the math for us, but we’ll have that model for those 128 words. Not really useful on its own, but let’s take that example above and apply it to another 128-word document and then combine the two charts we make into a bigger Statistical Model. With two we’ll have a larger variety of words, and the common words will pop up more often. Now let’s repeat that, like a billion times.

That sounds like a lot.

It is a lot, and I bet we’d have a good understanding of the probabilities in the relationships of words. So if I say “I want to go to the” you’d probably be able to guess what the next word might be. You’d say store, mall, park, airport, or something like that. If I pulled out that Statistical Model, I bet we could figure out what the odds of each one of those are. Maybe store has a 35% chance of being the next word if we had “I want to go to the” before it.

So, you want to go to the store?

Maybe, but the point is we’ve got a system for predicting the next word in a sequence of words. So, let’s take a few billion documents and make a statistical model. What do we have? A lot of data telling us the odds of words in conversation. That is what a Large Language Model like ChatGPT or Google Gemini or even Microsoft Cortana, it’s a big ledger of statistics.

So how does it know so many things?

Think about it, where do we find billions of documents where we can measure the relationship of words? The Internet, which includes Wikipedia, Facebook, technical manuals, news articles, and blog posts like this.

Ah, it knows all the things.

No. This is where the disconnect with how people understand AI happens. Remember how we trained the model above. It’s just the relationships between words. There are no articles or facts stored in the model, just the statistical relationship between words. If I were to say “How many feet tall is the Empire State Building?” What are the words that are most associated with that combination of words, in that order? They probably include things like “1250”, “feet”, “height”, etc., but you have to start the sentence first so the most likely next word might be “It” and then “is”, then “1250”, “feet”, “tall”, and “.”. Yes, punctuation is part of these models.

So it didn’t know how tall the empire state building is, it knows what words are most associated with the words in the question. When you put a prompt in like “How many mother sauces are there?” it calculates the next word. Let’s say it’s “There” (capitalized to start the new sentence). It will then do the whole calculation for “How many mother sauces are there? There” and it might come up with “are”. Then it runs it all again on “How many mother sauces are there? There are” and comes up with “five”. And so on until we hit How many mother sauces are there? There are five mother sauces.” At this point the odds of any word being next are so low that it determines that it should end with that.

That’s a lot. I recommend re-reading the last two paragraphs. All an AI does is calculate the odds of one word following others. Does that make sense?

Maybe, but it sounds like it knows stuff.

It does, because we don’t talk gibberish. All the stuff we’re using to train the model are coherent sentences and paragraphs from people. We know stuff and so the words we use are most associated with stuff we know. So if an AI puts together the most likely sentence it will resemble the most likely sentence to come out of a human’s mouth, and we know stuff.

But it makes mistakes.

You bet it does. Since it’s all based on odds, the odds can be skewed in different ways we weren’t expecting. Remember the Empire State Building above? What if every time someone refers to the Empire State Building on the internet, they say it’s 102 floors tall. Maybe because 1250 feet is boring or we think 102 floors sounds more impressive, whatever the reason let’s just say that 90% of the time, when someone on the internet refers to how tall the Empire State Building is they say 102 floors. Now let’s train a model with that internet. When you ask “How many feet tall is the Empire State Building?” The most likely number associated with “Empire”, “State”, “Building”, and “tall” is 102. So that model will say “The Empire State Building is 102 feet tall.” You and I know that is wrong, but the statistical model shows the most likely number is 102.

They’re doing better though.

We do our best to find ways to fine-tune the model, but that’s the hard part. The solution Google and ChatGPT use is to do a search and then summarize the results with AI. The AI isn’t answering, it’s just re-writing the answer from the search results.

Funny Story: Back in 2020 as the public started interacting with these models and using them for wider applications, they discovered that the models were returning racist and sexist results. “But we trained them with the internet?” Turns out the internet is racist and sexist. They had to create a fine-tuning dataset that has negative statistical associations with racist word combinations.

So they’re not self-aware?

We are so far from self-aware I find it comforting. To be self-aware, we would first need a general purpose AI. These are machines that run on their own, collect data, and make decisions on non-specific tasks. There are a lot of things that look like general AI, but they’re just clever programming. No independent thinking is going on.

Another Funny Story: People often talk about AI saying it wants to take over the world or that it feels sad. Remember, these models were trained on books, and when we write about AI, what do we write? It wants to take over the world, it’s sad, it’s in love, it’s a person. The answers are exactly what we wrote.

So what can they do?

We can train AI models to predict the odds of any language or similarly complex thing.

  • Image Fill – With images we do the same thing we did with the 128 word document above. Take a 16×16 pixel image and convert the 256 pixels into numbers representing the color of each one. Then we measure the relationships to produce probabilities in an image model. Do that enough times with large images and we can give it an image and it can fill what’s outside the image by predicting what the next pixel would be knowing the previous pixels.
  • Image Gen – We can add another dimension to the image model by associating words with each picture so the model can convert pictures into words, or words into pictures.
  • Video Gen – Add another dimension by using pictures in sequence, ie. this picture comes after this one and before that one. This could be used to predict what the next image would be like a movie projector.
  • Large Language Models (LLM) – This is what you’ve probably got the most experience with and they seem pretty impressive. They’re all trained with something called corpus which are collections of text like Wikipedia. There aren’t a lot of really big corpus’ around, so all these models use the same stuff to train their models: Common Crawl, Wikipedia, & Books Corpa

I’ve used them to write blogs.

And put a blogger out of a job? Probably not. This post is already long so I’ll talk about jobs another time. For now I just want you to know that AI is useful, complex, and so helpful for writing papers. It is not self-aware, thinking, or sad. Now go! Go and be smarter than everyone else in the room about AI.