'The Data Does Not Lie"

or so we're told

Apr 06, 2024

Don’t you love it when a politician utters something so gobsmackingly wrong, there’s nothing left to say except, “Oh, please!”?! I have a degree in economics, and one of the few lasting takeaways from that academic experience is that data can be massaged to say practically anything. First, let’s use this perspective for a quick analysis.

The statement "Data does not lie" is misleading at best. While data itself may not inherently be deceptive, there are so ways in which data can be misinterpreted, misrepresented, or manipulated to support false conclusions. It is even more erroneous when considering the impact of biased training data on AI models.

Here are a few key reasons why this statement is balderdash:

1. Bias in data collection: The way data is collected can introduce biases that skew the results. Sampling bias, for example, can lead to data that is not representative of the whole population, leading to inaccurate conclusions. Data bias can perpetuate and amplify bias and lead to false associations in AI outputs.

2. Data quality issues: Poor data quality, such as incomplete, inconsistent, or erroneous data points, can lead to faulty analyses and misleading conclusions.

3. Misleading data presentation: Data can be presented in ways that are misleading, such as using inappropriate scales, cherry-picking data points, or omitting important context. Data can be manipulated to emphasize certain aspects while downplaying others.

4. Correlation vs. causation: Data may show a correlation between two variables, but this does not necessarily imply a causal relationship. Misinterpreting correlation as causation is a common mistake. Omitting causation, ie current economic development data is dependent on decades of infrastructure investment data, is easy.

5. Lack of context: Data often requires context to be properly understood. Without the necessary background information or a comprehensive understanding of the data's limitations, it's easy to project incorrect conclusions.

6. Interpretation bias: The way data is interpreted can be influenced by the biases, preconceptions, intent, and motivations of the people/AI analyzing it. This can lead to different conclusions based on the same data.

While data all by itself may not "lie," the way it is collected, presented, and interpreted can lead to erroneous or misleading conclusions.

Now let’s look at this from a slightly different, more populist perspective on AI and data.

When we take another gander at this "Data does not lie" nonsense from the perspective of data bias in training data, it becomes painfully obvious that the statement is about as accurate as a drunk dart player. Biased training data can lead to more false outcomes than a politician's promise, and that's saying something!

First off, biased data perpetuates biases like a bad habit. If you train a model on data that's as skewed as a wonky table, it'll learn and spread those biases faster than a cat video on Tik Tok. Take facial recognition, train it on a dataset full of people from one demographic, and it'll perform worse on underrepresented groups than a tone-deaf singer at a karaoke night.

But wait, there's more! Machine learning models can amplify biases like a kid with a sugar rush and a microphone. Feed it historical biases, and it'll spit out predictions that are more lopsided than a wobbly bobblehead on a dashboard during a bumpy ride.

And don't even get me started on false correlations. Biased training data can make models see connections that are about as real as a unicorn in a tutu. Train a hiring algorithm on a gender-biased dataset, and it'll start associating random traits with successful candidates like a conspiracy theorist on a meth rush.

But here's the real kicker: if you use these biased models to make decisions in the real world, you'll be reinforcing societal biases like a pop-up ad that just won't quit. Unfair loan approvals, biased criminal sentencing, and discriminatory hiring practices will become the norm, and that's about as welcome as a "404 Error" message when you're trying to submit a job application.

So, what's the solution? Well, for starters, we need to be aware of biases in data like a hawk spotting its prey. We need to ensure our training data is as diverse as a United Nations meeting and collected from sources more reliable than a gossip magazine.

In conclusion, the statement "Data does not lie" is about as accurate as a broken clock twice a day.

And there you have it, two perspectives on one bit of political nonsense.

Jim Schweizer and Anthropic’s Opus LLM collaborated on this weekend edition of “Putting the AI in Aurora, AI.”

'The Data Does Not Lie"

or so we're told

Discussion about this post