#18: Wikipedia, Propaganda, and the Hidden Risks for AI

Jul 08, 2025

It’s easy to forget that Wikipedia is one of the most powerful sources of information on the internet: it is the most cited source online and one of the most trusted. For many people, it’s the first place they go to learn about topics ranging from quantum mechanics to political history.

But it’s not just humans reading Wikipedia anymore. LLMs absorb Wikipedia as part of their training data. Wikipedia generally contains a high-quality, relatively neutral body of knowledge. For this reason, LLMs are highly influenced by it when giving responses: Wikipedia made up over 3% of GPT-3's pretraining dataset. But there’s an issue: if Wikipedia content is subtly manipulated to reflect misinformation or bias, those edits don’t just shape public perception. They get baked into the AI systems that hundreds of millions of people use every day. This post is about how that can happen and why it matters.

How Wikipedia Does Content Moderation

Wikipedia’s moderation system is a fascinating system. Anyone can make an edit, but there’s a system of bots and admins who ensure Wikipedia’s content is up to standards. Trusted editors can gain privileges, such as reverting vandalism, protecting pages, or blocking users. Bots monitor patterns, admins intervene in disputes, and all Wikipedia activity (edits, reverts, blocks) is publicly logged.

Unfortunately, Wikipedia’s community-based contributing—its greatest strength—can also be a vulnerability. While the community is quick to correct obvious vandalism, more coordinated and subtle manipulation can slip through. In recent years, there’s been growing evidence of state-sponsored malicious users (particularly from countries like China and Russia) who are quietly shaping narratives by editing Wikipedia articles. These are strategic, long-term efforts, such as rewording descriptions of contested territories, omitting negative facts, and using sources that align with state narratives. And because the edits are often reasonable, such as rewording “invasion” to “military presence” or replacing independent sources with state-run media, they’re much harder to detect.

Subtle Propaganda is Still Propaganda

This kind of editing doesn’t look like misinformation in the traditional sense. It’s more about shaping the narrative of countries in a more positive light. Governments have figured out that they don’t need to break Wikipedia to influence it. They just need to make small, plausible edits over time that collectively shift how events, people, or regions are understood. The changes often fly under the radar, especially when they’re framed as “clarifications” or supported by seemingly “credible” sources.

One report documented a flood of edits on China-related articles, flipping the framing of Hong Kong protests, softening descriptions of Taiwan’s sovereignty, and reframing the Dalai Lama’s identity. In Russia, analysts have traced narrative control over how the war in Ukraine is described—especially on the Russian-language version of Wikipedia.

There’s a term for this: entryism. It’s a political strategy used by some countries to embed themselves within a system: not to break it, but to gradually steer it from within.

The LLM Angle: When Bias Gets Baked In

Wikipedia isn’t just a place people go to learn. It’s a foundational source for the datasets that train LLMs. Like I mentioned before, Wikipedia made up over 3% of GPT-3's pretraining dataset. And while that may sound small, it’s dense and high-quality content—models learn to value Wikipedia content much more than other sources.

If subtle propaganda is added to Wikipedia, there’s a real risk it will end up baked into the knowledge of LLMs. Not as something the model outright believes, but as something it’s more likely to generate, repeat, or lean toward. Recent reports on disinformation networks targeting LLMs talk about “grooming” training data—planting false or slanted narratives across websites, blogs, and encyclopedias that LLMs are likely to ingest. Wikipedia is an obvious target.

What Can Be Done?

Wikipedia isn’t helpless here. The platform has community governance, admin tools, bots, and visibility into every single edit. Bad actors do get banned. Coordinated manipulation campaigns have been detected and shut down. The history is public, and with enough people, subtle distortions can be caught.

But this kind of moderation is fundamentally reactive rather than proactive. It depends on people noticing a pattern, being motivated to dig through edit logs, and pushing for accountability. And when edits are technically accurate but selectively framed, it’s even harder to make the case that something is “wrong.”

For LLM developers, this raises deeper questions. How do we know what’s in the training data? How do we filter out subtle narrative manipulation that isn’t easily flagged? And what happens when a model starts to confidently explain a biased version of history?

Final Thoughts

Wikipedia is still one of the most important sources of information on the internet. It is a constantly evolving, mostly accurate snapshot of global knowledge. But like everything public, it’s vulnerable to strategic influence.

As LLMs become more powerful and deeply embedded into everyday life, the quality of their training data matters more than ever. We can’t address model bias without paying attention to the sources it learns from. If subtle distortions make their way into Wikipedia, they don’t just shape what people read; they shape what machines learn.

Thanks for reading!

Yatharth Arora

Jul 8

"For LLM developers, this raises deeper questions. How do we know what’s in the training data? How do we filter out subtle narrative manipulation that isn’t easily flagged? And what happens when a model starts to confidently explain a biased version of history?"

Maybe that's where human feedback comes in? We should not forget that at the end of the day, LLMs just generate text.

Expand full comment

Tech Unpacked

Discussion about this post