Reading Time: 7 minutes

Artificial intelligence is big business, and it’s not going away. International Data Corporation predicts that worldwide revenue for the A.I. market will top 157 billion in 2020, growing to over 300 billion by 2024. A.I. is prolific, with estimates of over 403,000 conference papers on A.I. in 2019 alone. The number of A.I. deployments is almost unknowable. Yet most people don’t realize their impact, how they are used to manipulate our behavior, and the disappearing need for even tacit consent.

The need for ethical oversight is growing.

Artificial intelligence makes many of our mundane tasks bearable. It’s ubiquitous, produces smarter phones, and makes searches personalized and instantaneous. When sifting through massive catalogs, it’s the most practical tool in industries like media and entertainment. During the US lockdowns, many of us didn’t care how Tiger King rocketed up our usually well-curated Netflix list with little effort.

A.I. provides shortcuts. When I enter the term “Giants” in Google, context matters. If I’m two miles from the Meadowlands arena, Google knows to surface the New York Giants schedule, their recent scores, and false hope for football fans. The probability of me wanting information on Giant Redwoods is low. In those instances, I’m happy the tool knows just enough about me to give me what I want quickly—all for the low, low price of eroding privacy.

Sure, we can opt out of giving more private information because our social security numbers are not required for a schedule. Nevertheless, the user’s presence alone is sometimes enough to accelerate the evolution of the A.I. industry—all with very little oversight.

Let’s use an analogy from machine learning.

You’re walking along a shore and find a set of footprints. Nothing ominous or even unique about the trail. The tide will soon come in, and the prints will fade and disappear. The moment will pass without the observer ever giving a thought about the person they once belonged to—the name, preferences, hobbies, political leanings, and more that combine to make up who we are. After all, they are just footprints.

Now imagine clicking on your camera phone. It can record the prints, utilize a little bit of 4AI-assisted computer vision to analyze the pattern, then extrapolate the walker’s gait. Based on the time of day and lighting, it estimates the depth of the footprint and, with some math, gives you a probability of the owner’s gender and age. It’s not a 23andMe profile, but it does begin to filter a significant portion of the population. 

This is where we start to measure and label datasets. We may also note that the footprints were there in the morning after the tide, on a school day, which further reduces our set by the number of school-aged children in the area and excludes their parents. There was no sign of canes, walkers, or accessibility vehicles, narrowing the possibilities even more.


Let’s add some variables. The police heavily patrol the area, which is not typically accessible to the BIPOC population, further expanding what we can infer about the prints. Quickly, we’ve gone from a possibility of millions of people to a fraction of the population. If we segment the groups further, both the likelihood and accuracy of our predictions go up. Again, without a name.

Many of the inferences we made so far might’ve been surmised at a dinner table at 221B Baker Street. So now imagine targeting the beachgoers a bit more intensely with a team of data scientists, sociologists, and digital archeologists. Can we figure out their likely brand preferences or loyalties, or what color hats they are wearing? Were there any anti-vaccination pins or easily discernible political leanings?

Obviously not…right? Given our hi-tech team, this information is far more accessible than you think.

We’ve learned a fair bit, all without engaging with the beachgoers, much less getting their consent. It’s still superficial information, which would be fine if we were simply interested in selling more hats to the crowd. But what happens if we are trying to sell another good, like information? Or when they stop being mere consumers of that information and become unknowing participants in testing? 

What’s out of bounds?

Let’s take a step back. In formalized social science settings where people are the subject of observation and potential experimentation, there’s usually involvement of an ethics board, a legacy of the Nuremberg Trials known as the Nuremberg Code. Subsequently, there were rules implemented: People should know and consent to experiments of which they are a part, and no harm should come them. The participants should be able to end the experiment whenever they like. 

Most A.I. algorithms in the wild never see that rigor. If we take my previously generalized framework, you’ll notice that some of my assumptions might be prone to bias. Do anti-vax pins influence the next series of questions, and are my conclusions fair? Those biases, at a minimum, play a role in data collection and framing what we’re trying to ascertain. At least with some academic ethics boards and just a little rigor, sometimes we get good peer reviews that call out these assumptions.

But assume we get it right, and every A.I. model and machine learning method is bias-free. How do we explain how we got there? Why did the banking and finance model predict that the higher-income Black family would default on their mortgage rather than the lower-income white family? Why did the legal model provide sentencing guidelines that far exceeded the judge’s recommendations? 

The lack of explainability is a problem the A.I. field is still grappling with, especially with the proliferation of Deep Learning methods that provide near black-box results. Consequently, how to explain to every person involved—that’s you, me, and everyone else—that they’re subject to experimentation, and that their behaviors are influenced by subtle yet genuine forces that they can’t see?

Back on the beach, inferencing allowed us to achieve forensic-level approximations without collecting “intrusive” data about the footprints’ owner. And these simple approximations give us “good enough” accuracy between the members to make general approximations for predictions, which encourages us to do it again and again. 

How to explain to everyone that they’re subject to experimentation, and that their behaviors are influenced by subtle yet genuine forces that they can’t see?

These tactics are far from new. In her book about the rise of Simulmatics, Harvard historian Jill Lepore describes Project Macroscope, an attempt to create a data-centric computerized election project for the 1960 Presidential race that would seek to predict the behavior of voters derived from returns and public opinion surveys, or even their likely response to a particular speech. Lepore quotes a report by two of the lead data scientists:

We will, from our model, be able to predict what such a speech would mean to each of 1,000 sub-groups of the population, and how many individuals belonging to each sub-group there are in each state. We would therefore be able to predict the approximate small fraction of a percent difference that such a speech would make in each state and consequently to pinpoint the state where it could affect the electoral vote. We might thus advise, for example, that such a speech would lose 2 to 3% of the vote in several Southern states that we would carry anyhow, but might gain ½ of a percent of the vote in some crucial Northern state.

That was 60 years ago. We are 60 years further along that road.

Another example of early management by algorithm was recorded in 1967 when civil unrest hit the streets of LA and Detroit. The Simulmatics Corporation attempted and failed to create a riot prediction computer for the Johnson administration. 

Despite the failure, we still see their legacy in predictive policing.


Now forensic archeologists sometimes look at very disjointed pieces of information and evidence to extrapolate as much as they can about certain societies. Imagine giving them a tool to test their hypotheses in real-time through historical antiquity. They could test predictions to deliver sponsored ads allowing the resulting impact to reinforce, then shape their underlying beliefs of the world’s observable substrate. What if A.I.-driven disinformation was distributed in the wake of Pompeii? Caesar never really died; he won the election. Black Death? No, it’s only the flu. How would history have been affected by each of these disinformation campaigns?

A study by search engine DuckDuckGo found that Facebook tracks you on 36% of the top 50,000 websites, while Google tracks you on 86%. Even with the upcoming cookie-less world in 2023, your digital footprint will still be observable without the crumbs. It’s like being able to count the shadows on the wall without seeing the owner’s faces or, as in another anecdotal example, finding the correlation between consumption patterns of unscented lotion and pregnancies, then mailing diaper coupons to an unknowing family.

A.I. is in the wild, and there’s no curbing the appetite for its use, development, or sale. Countries like China, with fewer data protection practices, have the computing power and brute force to accelerate global adoption. We are in a race for A.I. dominance at nearly any expense. Economic expansion for non-US countries is growing dependent on their ability to innovate while unfairly reacting to the exploitive nature of first-world consumption. It’s not stopping. And with the advent of autonomous machine learning tools, which all but remove the need for data scientists to train models, the algorithmic boom is only beginning.

What’s missing is any real understanding by the public of the implications.

The algorithmic boom is only beginning. What’s missing is any real understanding by the public of the implications.

There are seemingly invisible forces at work that we shouldn’t take for granted. But, they’re not hidden, and many of the flawed designs result from the sometimes purposeful exclusion of marginalized voices or the over-representation of the most powerful ones. And many of the implementations bring along repercussions that substantively, though mostly unintentionally, increase the divide, ignore equity, and often erode conscious calls for inclusion. 

The propagation of A.I. and big data doesn’t just reinforce the learning within the models but helps cement the biases within the flower-scented maze. Algorithms carry the biases of their makers. They can’t do otherwise. As A.I. finds its way into every corner of our lives, we have a shared responsibility to educate ourselves and each other, then provide oversight of its uses—or risk our unmaking.

Avatar photo

Alix is a Senior Technologist at Google, focused on helping customers reimagine an inclusive, sustainable future where innovation doesn't leave people behind.