Overview:
Recent online platform changes have led to user protests. But can they be enough, to combat the deeper problems in our privately owned public commons?
Anti-corporate-monopoly and digital rights advocate Cory Doctorow calls it the “enshittification” lifecycle. First, a company caters to a demographic, offering free services to build a user base worth selling to investors. Then it caters to business clients at cost to the original users (maybe by scraping their data, or maybe by clawing back free features after they’ve been hooked). Finally, it caters to itself, leveraging how much business clients now rely on its user data or pay-to-play features to jack pricing for stockholder profit.
Doctorow was writing about Amazon, Facebook/Meta, and TikTok, but the principle carries throughout our online tech sphere. This week, Reddit users staged a protest that saw over 8,000 subreddits go dark to protest changes that will be implemented on July 1. Reddit plans to hike pricing for application programming interfaces (APIs) that currently allow millions of users to interact with the site through third-party apps like Apollo and Narwhal.
This change runs deeper than simply trying to force users to access Reddit directly. The company has stated that, in order to improve income, it needs to be able to gather more user data it can sell for “artificial intelligence” (AI) training purposes. In a leaked memo reported by The Verge, CEO Steve Huffman assured his staff that the protests would pass. Some APIs are planning to shut down entirely at the end of this month, rather than face the higher fees, and many subreddits are making the difficult decision as to whether to close permanently, so as not to be complicit in having user data scrubbed for profit in the burgeoning hype-cycle of AI technologies.
READ: Strange course changes in Replika have users upset and confused
But is this form of user protest enough, when faced with privately owned forums that have become so crucial to mainstream online life?
For years now, there has been a “problem of search” connected to the enshittification of Google and similar search engines: when one now looks for a reliable answer to a simple query, one has to wade through promotions links given top rankings because they paid for the privilege. For a while, one quick fix was to add the word “reddit” to one’s search term, and take one’s chances with the answers found on relevant forums there. Not so anymore.
But Reddit isn’t the only platform trying to corral users into paying more to maintain access to basic site features without compromising their data. This week, Elon Musk promised that Twitter would move to a direct-messaging (DM) model in which only Twitter Blue users ($8USD per month) would be able to initiate DMs. Musk stated that the move would combat the huge rise in spam DMs, because “it is increasingly difficult to distinguish between AI bots. Soon, it will be impossible.”
On a smaller platform, this feature can be a reasonable fix to the bot problem, but on a massive platform such as Twitter (which has at least 230 million active, monetizable accounts), it drastically transforms a key function users have come to expect over the site’s history. Namely, this ban would keep non-paying users from being able to connect with public authorities (government figures, prominent journalists, NGOs, municipal service providers) that have normalized this platform as a main point-of-contact with concerned citizens.
In short, the change wouldn’t be a problem if the wider public weren’t already hooked on using this private medium to shore up such critical public outreach work. But this is the world to which we’ve consigned ourselves, over the last few decades, through our lax approach to the regulation of online enterprise.
AI hype, and analog props
There’s a key reason, too, that online platform companies view human-data-scraping as a key component in their future business strategies: machine learning (ML) programs, and especially large language models, aren’t doing very well as they reach the limits of their current data sets. For years, we’ve known that most such programs degrade quickly as they reach the limits of human-generated inputs. In a study published earlier this year in Nature, researchers at Harvard, MIT, Cambridge, and the University of Monterrey found that 91 percent of ML programs underwent what’s called “AI aging”. This phenomenon doesn’t just include a degradation of overall output with time, but also an increase in erratic outcomes: that is, volatility in the production of either high or low quality results.
Many researchers use what’s called “synthetic data” as a stopgap, but this data (itself computer-generated) poses problems because it is much more difficult to sustain an effective balance between high and low quality inputs from such a source. That’s why, earlier this week, you might also have encountered reports about an impending “model collapse” for the AI hype-engine that’s been waxing poetic about the future of such tech for the last year. These observations accord well with what the OpenAI CEO himself said in April, about ChatGPT having already played out its best tricks, and the industry requiring greater innovation to move forward.
Last year, though, Analytics India offered this cheerful solution to the problem of low quality and otherwise insufficient data to keep the AI hype-engine churning:
Government or corporate players can facilitate the production of large quantities of data through widespread screen recordings, video surveillance, or video recordings (for example, self-driving cars) that continuously stream real-world data. For example, as Manu points out, with models like Whisper from OpenAI, transcribed videos can be considered as a source to feed more data into LLMs.
Ayush Jain, Analytics India Magazine, December 8, 2022
As sinister as that sounds, though, it’s also an everyday business strategy that many tech companies are leaning into, including Reddit, as they strive to optimize profit models on the backs of users so bereft of a proper public commons that no solid, widespread alternatives to engaging with ethically compromised technologies exist.
It should not be so difficult to build better secular spaces. However, thanks to legal exceptions granted to the online sphere, especially through the immunity from civil liability offered to online “publishers” through Section 230 of the US Code, we occupy a stratified world in which full digital citizenship remains a pipe dream.
As with recent hype over megacity and “special economic zone” projects, the crux of our problem is the relentless siphoning of public funds into private ventures, where corporations gain state license to benefit from civilian data: often at cost to our democratic integrity, our capacity to control the spread of disinformation and protect against espionage, and of course our privacy.
The solution will not be easy—and sadly, Reddit CEO Steve Huffman is likely correct that, despite the most well-intentioned of short-term user protests, most online will soon consign themselves to the new status quo, just as humans have settled for mediocrity on other online platforms come before.
But this is a problem that merits more of our express attention and political activism going forward. At least, if ever we want to improve human agency in whatever form our fragile public commons hopes to take on next.