Generative AI hallucinations are the least of our problems
A year ago, AI was big news… if you were a data science geek or deeply concerned with protein folding. Otherwise? Not so much. But, then along came generative AI, you know it as ChatGPT, and now everybody is psyched about AI. It’s going to transform the world! It’s going to destroy all “creative” jobs. Don’t get so excited yet sparky! Let me re-introduce you to an ancient tech phrase:
Some out there think AI chatbots “think,” can learn, or at least they fact-check their answers. Nope. They don’t. Today’s AI programs are just very advanced, auto-complete, fill-in-the-blank engines. You’ve been using their more primitive ancestors in your email clients and texting programs to help clean up your spelling for years.
True, what ChatGPT does, thanks to its large language model (LLM), is much more impressive. If I want it to write a short story about what Leia Skywalker, aka Princess Leia, was doing after Star Wars: A New Hope, it can do that. Did you know Chewie was teaching her how to use a blaster? Well, that’s what ChatGPT tells me, anyway.
That’s fine, but when I asked to tell me about Red Hat laying off 4 percent of its workforce, even after I fed it the layoff memo, ChatGPT confidently told me that Red Hat’s CEO Paul Cormier had said… Wait. Cormier hadn’t been the CEO since July 2022.
Why did ChatGPT get that wrong? Well, I’ll let it tell you: “As an AI language model, my knowledge is based on the data I was trained on, which goes up to September 2021. I do not have any real-time updates or information on events or data beyond that date.”
So, even though I’d given it more recent data, which listed Matt Hicks as Red Hat’s current CEO, it still couldn’t incorporate that into its answer. Now, there is a way around this, but an ordinary user wouldn’t know that.
Most people assume ChatGPT “knows” what it’s talking about. It doesn’t. It just looks up the most likely words to follow any query. So, it makes sense that everyone’s favorite Wookie would get mentioned in a Star Wars story. But, when you’re dealing with facts, it’s another matter. They only “know” what’s in their LLMs.
Of course, because it doesn’t know Star Wars from Wall Street, it only knows what words are likely to follow other words, ChatGPT, and its brethren will make up things out of whole cloth. We call these hallucinations.
Let’s look closer. While the suddenly not-so-open ChatGPT won’t tell us what’s in its LLM, other companies are more forthcoming. The Washington Post, for example, recently analyzed Google’s C4 dataset, which has been used to inform such English-language AIs as Google’s T5, and Facebook’s LLaMA.
It found that C4 pulled its data from such sites as Wikipedia; Scribd, a self-described Netflix for books and magazines; and WoWHead, a World of WarCraft (WoW) player site. Wait? What!? Don’t ask me, but there it is, number 181 on the C4 data list. That’s great if I want to know the best build for a WoW Orc Hunter, but it makes me wonder about the data’s reliability on more serious subjects.
It’s all about GIGO. If, as will happen soon, someone creates a generative AI that’s been fed mostly data from Fox News, NewsMax, and Breitbart News, and I then ask it: “Who’s the real president of the United States today?” I expect it will tell me it’s Donald Trump, regardless of who’s actually in the White House.
That’s about as subtle as a brick, but there’s already bias inside the data. We just don’t see it, because we’re not looking closely enough. For instance, C4 filtered out documents [PDF] with what it deemed to be “bad” words. Because of this and similar efforts, African American English (AAE) and Hispanic-aligned English (Hisp) were filtered out at substantially higher rates, 42 percent and 32 percent, respectively) than White-aligned English (WAE), 6.2 percent.
I strongly suspect that C4’s designers didn’t have a clue what their filters were doing. But, repeat after me, “We Don’t Know.” Worse still, they may not even know that they don’t know.
As the National Institute of Standards and Technology (NIST) pointed out in March 2022, before we all become so enamored of AI: “A more complete understanding of bias must take into account human and systemic biases…
“Systemic biases result from institutions operating in ways that disadvantage certain social groups, such as discriminating against individuals based on their race. Human biases can relate to how people use data to fill in missing information, such as a person’s neighborhood of residence influencing how likely authorities would consider the person to be a crime suspect. When human, systemic, and computational biases combine, they can form a pernicious mixture — especially when explicit guidance is lacking for addressing the risks associated with using AI systems.”
As NIST research scientist Reva Schwartz, said, “Context is everything. AI systems do not operate in isolation. They help people make decisions that directly affect other people’s lives.”
So, before we turn over our lives and work to the god of generative AI, let’s consider its feet of clay, shall we? It’s nothing like as accurate or reliable as far too many of us are assuming it. And, while its results are only going to get more impressive over time, we’re not close at all to solving its bias and hallucination problems.