Free expression vs. proactive safety


Moderating open platforms at scale

On Tuesday, March 24th, I gave a presentation at the London Trust & Safety Summit on the topic of holding the tension between free expression (something that Automattic is deeply invested in defending) and proactive safety (something that is necessary for healthy platforms). The presentation received a lot of wonderful feedback, and really seemed to resonate with folks across the industry. I occasionally went a little off script to underscore an important point, but the following is what I wrote for this presentation.

I want to start with something a little uncomfortable.

Every single moderation decision your team makes is a values decision. Not a policy decision. Not a technical decision. A values decision. And most of us — if we’re honest — are making those decisions faster than we’ve ever actually articulated what our values are.

That’s not a critique. That’s just the reality of operating at scale. The queue doesn’t wait for your philosophy to catch up.

At Automattic, we’re the company behind WordPress.com and Tumblr. Together, those platforms host an enormous slice of the open web — personal blogs, journalism, creative communities, small businesses, organizing spaces, fan archives, art. We’re talking about publishing infrastructure for people who, in many cases, don’t have anywhere else to go.

And our founding mission — the thing that’s actually written down and taken seriously — is to democratize publishing. Make it accessible. Make it open.

That’s not marketing language. It shapes real decisions. It means we start from a posture of yes. Permissive by default.

And here’s the thing about that posture: it doesn’t make trust and safety easier. In some ways, it makes it harder. Because when you’re committed to giving people a voice, every removal has weight. Every enforcement action is, at some level, in tension with the thing you said you were for.

What I’ve come to believe — and what I want to explore with you today — is that this tension isn’t a problem to be solved. It’s the work. The platforms that are doing this well aren’t the ones that have figured out how to make the tension go away. They’re the ones that have learned to hold it clearly enough to act consistently inside it.

What ‘Free Expression Platform’ Actually Means Operationally

Let me try to make this concrete, because ‘free expression platform’ can mean a lot of things, and most of them are useless in an ops context.

What it actually means, operationally, is that your default is to leave content up. The burden of proof is on removal, not presence. You need a reason to remove that aligns with your values — not a reason to allow. You’re not looking for reasons to remove, and you’re not prescriptively defining what’s allowed. You’re encouraging free expression.

That sounds simple. It is not simple.

Because what it requires is that you’ve done the upstream work of being explicit about what those reasons actually are. What categories of harm are serious enough to override the default? What evidence threshold do you need before you act? What’s the difference between content you find objectionable and content that actually causes harm?

These are not questions you can answer in the moment, under pressure, at 2am when something is escalating. You have to have answered them before. And they have to be answered specifically enough that two different reviewers, looking at the same piece of content, reach the same conclusion most of the time. We are cultivating good judgment based on shared, well-defined values, rather than a checklist for teams to work against.

I think we’ve all said or heard ‘we use our judgment.’ And what I mean when I say it is that we’ve hired thoughtful people and trust them. But judgment without a shared framework isn’t consistency. It’s variance. And variance, at scale, is its own kind of harm. It’s harmful to your users, it’s harmful to your moderators, and it’s harmful to your systems.

So the first practical thing I’d offer is this: if you haven’t done the work of writing down your values clearly enough that they function as actual decision criteria, that is the work. It’s not background. It’s not culture. It’s your enforcement infrastructure.

For us, some of that clarity comes from the mission itself. Democratizing publishing means we’re especially attentive to enforcement patterns that could have a chilling effect on marginalized voices — people who are disproportionately the ones who get reported, who get moderated, who lose access to platforms. Our posture toward free expression isn’t abstract. It has a constituency. And being clear about that actually makes some hard calls easier.

Not all of them. But some of them.

The Signal Problem —
Why Scale Breaks Human Review

Okay. Let’s talk about scale. Because I think there’s a version of this conversation that jumps straight to ‘we used AI and it got better,’ and I want to resist that, because it skips the part that actually matters.

The problem that scale creates for trust and safety isn’t primarily volume. We can hire for volume. We can build queues. We can add capacity.

The problem is noise.

When your reporting surface is large enough, the signal — the content that represents genuine, serious harm — gets buried. Not because it isn’t there. Because the ratio between signal and noise degrades. And when that ratio degrades, your human reviewers start making worse decisions. Not because they’re bad at their jobs. Because they’re processing so much low-stakes material that their calibration drifts. Their threshold creeps. Things that should feel urgent start feeling routine.

This is the thing AI actually fixes, when it’s used well. Not the volume problem. The noise problem.

The frame I find most useful is: triage layer versus decision layer.

AI belongs in the triage layer. Its job is to surface the things that need human attention, faster and more consistently than a purely reactive reporting system can. Toxicity classifiers, risk signals, pattern detection — these aren’t making decisions. They’re making it possible for humans to make better decisions, by putting the right things in front of them at the right time.

The decision layer stays human. Full stop. Not because AI can’t produce outputs — it can. But because the decisions that matter in trust and safety are values decisions. And values accountability has to live somewhere. It can’t live in a model.

There’s a human cost to naming that clearly, and I don’t want to pass over it without acknowledgement. Better signal triage means your reviewers are spending less time on low-stakes noise — but it also means that when they are reviewing, they’re looking at a higher concentration of genuinely violative content. The queue gets more efficient and more brutal at the same time. That’s a real moderator resilience challenge, and it belongs in any honest conversation about what AI-assisted moderation actually produces for the people doing the work.

Now — I want to be specific about what I mean by ‘surface things for human attention,’ because I think there’s an important distinction that often gets lost in this conversation, and it matters especially for platforms that are serious about free expression.

There are basically three different things people mean when they talk about AI in content moderation. The first is triage of reported content — AI helps you prioritize and process the queue faster. That’s useful, but it’s still reactive. You’re waiting for someone to report something before the system touches it.

The second is what I’d call behavioral pattern detection — AI looking across signals like account velocity, network relationships, posting patterns, engagement anomalies. Not reading every post, but watching how accounts behave. This is closer to fraud detection than surveillance. You’re not evaluating expression; you’re evaluating conduct. And it’s genuinely proactive in a way that’s consistent with a permissive-by-default posture, because you’re not presuming content is harmful — you’re noticing that something about how it’s being produced or amplified looks like a coordinated operation.

The third is content scanning at ingestion — matching everything against known harm signatures. PhotoDNA for CSAM is the clearest example. That model has narrow, legally grounded use cases where it’s not just appropriate but required. But it is a fundamentally different posture. It is closer to the panopticon. And for most categories of content on an open platform, it’s in real tension with what we said we were for.

When we talk about shifting toward proactive safety at Automattic, we’re talking about the second model — behavioral signals — not the third. And I want to be honest that we’re not fully there yet. We’re moving into it. We have capabilities in this space and we’re building more. But the direction is deliberate: proactive in terms of conduct patterns, not in terms of reading every post before a human has any reason to look at it.

That distinction is worth holding clearly. Because the question isn’t just what AI can technically do. It’s what kind of platform you’re building, and whether your infrastructure is consistent with the values you said were foundational.

What proactive behavioral detection looks like practically: you use classifiers not just to catch policy violations, but to surface emerging risk. Things that aren’t yet a breach but are trending toward one. Account behavior patterns that show up in the data before they show up in your reports.

That capability — being able to see something developing rather than responding after it’s fully formed — is genuinely new. And it’s worth being clear about what it requires: it requires that you’ve defined your risk categories specifically enough that a classifier can be trained against them. Which brings us back to the values-as-infrastructure point.

You can’t automate toward clarity you haven’t achieved.

Shifting Left —
The Industry Is Already Late

I want to say something that I think is true and that we don’t say out loud enough in rooms like this.

We are behind.

Not in a ‘we could be doing better’ sense. In a ‘the cost of reactive-only moderation is already in your data, whether you’re measuring it or not’ sense.

Every cycle you spend enforcing after the harm has already occurred — that’s a user who got hurt, or left, or learned that your platform doesn’t protect them. Reactive moderation isn’t a starting point on a maturity curve. It’s a debt that compounds.

Shifting left means moving your intervention earlier. Before the policy breach, if possible. Before the harm is fully formed, certainly. And this is where I want to push back a little on how AI often gets positioned in this conversation — as a detection layer, a filter, a thing that catches bad content faster.

That’s true. But it’s not the most important thing AI can do for you.

The more valuable capability is using AI to surface emerging risk — patterns that aren’t yet violations, behaviors that are trending toward harm before they’ve crossed a line. That is a genuinely different posture. It’s not reactive moderation running faster. It’s a fundamentally different relationship to time.

But here’s what that requires, and this is the part that I think organizations underestimate: you cannot deploy AI proactively toward values you haven’t yet articulated. Classifiers need to be trained against something. Risk models need to be oriented toward something. If your policy framework is vague enough that two reviewers reach different conclusions on the same content, your AI is just automating that inconsistency at scale.

The values work and the AI work are not sequential. They’re the same work. Getting specific about what you’re protecting — specific enough to function as actual decision criteria, not just principles — is what makes proactive AI-assisted safety possible.

Platforms that are treating these as separate tracks, where ‘we’ll sort out the policy side eventually’ while building the technical infrastructure — they’re going to find that the infrastructure doesn’t point anywhere useful. You can’t automate toward clarity you haven’t achieved.

There’s also an accountability dimension here that matters for this audience specifically. When your AI surfaces something as elevated risk, and a human acts on that signal — someone in your organization made a values call. The model surfaced it; a person decided. That chain has to be legible. Not just for legal reasons, though those matter. But because values accountability is what distinguishes a platform that has integrity from one that just has enforcement.

The Adversarial Reality —
AI on Both Sides

I want to spend a few minutes on something that I think sometimes gets framed as futurism, when it’s actually just current reality.

Bad actors are already using AI. That’s not a concern about the future. That’s a description of the present.

What that means practically: coordinated inauthentic behavior is more automated. Synthetic content — fake reviews, generated spam, AI-produced harassment campaigns — is higher volume and harder to distinguish at the surface level. Evasion tactics are more sophisticated. And the time between a platform developing a detection capability and bad actors developing a countermeasure is shrinking.

This is not a reason to panic. It’s a reason to be clear-eyed about what the operational reality requires.

First: your detection has to iterate faster than your adversaries. That’s a structural point, not just a technical one. It means you need a team that treats adversarial adaptation as a core part of the work, not an edge case. It means you’re analyzing evasion patterns, not just violations. It means your classifiers get retrained on a timeline driven by adversarial behavior, not just by your internal roadmap.

Second: the asymmetry matters. A bad actor only has to find one gap. You have to close all of them. That asymmetry doesn’t go away with better tooling. What it means is that you have to layer your defenses, so that no single failure point is catastrophic. AI-assisted triage, human decision-making, behavioral signals, pattern detection across accounts — each layer catches what the others miss.

Third, and I think this is underappreciated: AI also helps you think about the full shape of an operation, not just individual pieces of content. Human reviewers look at content. AI can see patterns across accounts, across time, across behaviors. The thing that looks ambiguous in isolation often looks unambiguous as part of a pattern.

The platforms that are not building AI into their trust and safety infrastructure are not staying neutral. They’re falling behind. And ‘behind’ in an adversarial environment doesn’t mean slow. It means losing.

Conclusion

I want to close with something that’s less about tools and more about what this all requires of us as a function.

The shift to proactive, AI-assisted safety isn’t primarily a technical transformation. It’s a judgment transformation. It requires T&S leaders who can hold the values tension clearly enough to act consistently inside it — who can explain tradeoffs to executives without oversimplifying, who can build frameworks that are specific enough to function but flexible enough to evolve.

It requires, honestly, a level of intellectual clarity about your own values that a lot of organizations haven’t necessarily been forced to develop yet. Because reactive moderation lets you defer that work. You can show up to the thing that’s in front of you and handle it. Proactive moderation doesn’t let you defer. It forces the question: what are we actually for?

Automattic’s answer is: we’re for an open web where more people have a voice, where publishing isn’t a privilege, where the platforms that carry democratic discourse are also the ones that protect their communities. That’s not a slogan. That’s a decision framework.

The most important thing I’d leave you with is this: the tension between free expression and proactive safety is not a problem waiting for a solution. It’s the permanent condition of operating an open platform with integrity. The goal isn’t to resolve it. The goal is to be honest enough about it — with your teams, with your users, with yourselves — that you can act well inside it, consistently, at scale.

That’s the work.

Thank you.

Filed in:


2 responses to “Free expression vs. proactive safety”

  1. Andrea Badgley Avatar

    Damn, Zandy, you are so smart and thoughtful and articulate. I feel shiny just for knowing you. I’m proud that you’re working for Automattic’s communities and the open web. We’re lucky to have you thinking and building for us ❤️.

    1. Zandy Avatar

      That is so kind of you to say! I love working with you, and listening to you has repeatedly made me better at my job.

Leave a Reply

Discover more from Revelry Reverie

Subscribe now to keep reading and get access to the full archive.

Continue reading