Crowdsourcing moderation without sacrificing quality

(Previously: optimizing the news feed)

Online discussions consist mostly of uninteresting filler, with a sprinkling of thoughtful and valuable content that I’d like to engage with. The net result is that I usually avoid them. This is a shame, because in principle the internet offers a really great opportunity to increase access to and participation in useful discussions. It’s easy to focus on the  fora that were once great and have since decayed, but I think that the real prize is the much-higher-quality fora that have never existed.

Some people consider the shortcomings of online discussion a tragic fact about humanity. I’m more inclined to see it as a tragic fact about online discussion platforms. Fortunately, facts about discussion platforms are much easier to change than facts about humans.

(Some of these ideas came from discussions with John Salvatier and especially Andreas Stuhlmüller, and many of them seem to be in the air more broadly. See also dialog markets.)

A basic proposal

If a trusted curator/moderator had an unlimited amount of time, they could look at every post and make judgments like:

  • Which posts should be displayed most prominently?
  • Which posts should be visible at all, vs. being hidden behind a “click to see more”?

I think that if Scott did this for all comments on Slate Star Codex, it would be a happier place. If Eliezer did this for all comments on LessWrong then we probably wouldn’t be speculating about why it’s such a mess.

Here is a possible substitute, which could save the moderator a whole lot of time:

  • Provide the moderator with tools to hide comments/threads, or to make them be displayed more/less prominently.
  • Provide other users with tools to express judgments about comments; maybe they are the same recommendations that the moderator can make, or maybe they can indicate other judgments like funny/true/useful/annoying/etc.
  • Sometimes query the moderator: “How would you change the presentation of this post? Would you hide it / promote it / demote it / do nothing?” The moderator can view the surrounding discussion and input from other users.
  • Train a machine learning model to predict the moderator’s responses to queries, based on the input from users (and perhaps based on other features of the comments/threads, such as who posted them, length, etc.)
  • Rather than relying on the moderator to actually moderate, use the model to predict what the moderator would do.

I’ll tentatively call this arrangement “virtual moderation.” I think it will probably eventually work well.

(In general I think it would be best to not display karma/likes on posts, and to not use user input in mechanisms other than virtual moderation, but this is a detail.)

In the best case, virtual moderation would work as well as the moderator moderating everything. All it can do is decrease the labor for the moderator. But if we saved enough of the moderator’s time, we could potentially use other mechanisms to convert “moderator time” into “quality:”

  • The moderator could separately implement  “aggressive” and “passive” policies, and users could choose between the aggressively-moderated and passively-moderated views of the discussion. I suspect that many moderators would feel more comfortable being aggressive in “aggressive” mode, and that most users would end up preferring that view.
  • The best moderators could moderate much larger domains.
  • Several moderators could moderate the same discussions, with each user picking their favorite.
  • We could give the moderator more time or resources to reflect, or use complex processes as moderators (e.g. peer review).
  • We could give the moderator more tools, e.g. providing incentives for high-quality contributions.

Note that if the community can’t do the work of moderating, i.e. if the moderator was the only source of signal about what content is worth showing, then this can’t work. We are at best exploiting initially-untrusted cognitive labor, not reducing the total amount of cognitive labor required. (Though redistributing the cognitive labor can be a huge win, because once you are reading a comment it is much cheaper to do the labor of evaluating it.) If you had powerful enough ML you could replace human cognitive labor with machine cognitive labor, but that’s not the initial value add.


A key question is: just how little work can the moderator get away with?

I’m optimistic that they would only have to answer a few queries, perhaps a few hundred or thousand over a decade-long lifespan of a blog. I’m not sure whether existing semi-supervised learning is up to this task, but I think it’s a interesting/tractable/important ML problem.

If some user’s judgment is very predictive of the moderator’s judgment, then they can basically be an additional moderator—in particular we can use their judgments as data to identify other predictive features (as well as directly updating the moderated view based on their recommendations).

More generally, we can bootstrap to more and more complex+widely applicable predictors. First we find simple/sparse predictors that are highly accurate but aren’t often available; then we find more complex/dense predictors that are reasonably accurate at predicting the simple predictors; then we keep going…

These behaviors might fall out of existing approaches to semi-supervised learning, but I suspect that we would actually need to develop new semi-supervised learning algorithms (e.g. along these lines). I don’t know how efficiently this could work.

Theoretically, the moderator needs to remain active in order to prevent abuse (and to be available to moderate any content with small probability). Otherwise a dishonest user could make useful judgments early in the system’s lifetime, and then once the moderator leaves they could start making bogus judgments.

In practice I would not expect the requirement for continuing moderator involvement to be a problem:

  • A sensible active learning algorithm would query the moderator when there was dissent amongst previously-predictive signals; so for the most part, the moderator could prevent manipulation by stepping in when something contentious happens.
  • Attackers can’t pursue a strategy of “following” the moderator and copying their judgments—we are learning a model that predicts the moderator’s judgments, not which retrodicts them.
  • Even if the moderator leaves entirely, the difficulty of manipulating the system is radically higher than the difficulty of manipulating existing discussion platforms.


Here is a hypothetical roadmap, with terrible time estimates:

  • Search for some relevant dataset to test the idea with. It would be great if we could get e.g. all LW posts + upvotes + downvotes (including the identity of the upvoters/downvoters, though I expect this is tough for privacy reasons), or perhaps similar data for some subreddit.
  • Spend another 5-10 hours searching for other problems and considerations; this may result in a revised roadmap. (This might get interleaved with the next step, since seeing how the ML works would be a useful input.)
  • Spend another 40-200 hours working on the underlying semi-supervised learning problem and seeing how well you could do. (This might get interleaved with the next step.)
  • If everything looked good, start exploring possible arrangements with rationalist-adjacent discussion fora (LessWrong, EA Forum, Arbital Discussion [?], Slate Star Codex, new platform for an existing forum based on a fresh WordPress/Reddit fork…). Think about ease of implementation, interest of administrators, possible upside, natural moderators…
  • Spend another 60-600 hours implementing the machinery to elicit moderator judgments, sort/hide/annotate comments based on model outputs, collect relevant data, train and run the model. (At this step, it might get handed off to people who are doing other development work on the forum.)
  • If it looks good, expose it as an option for users.
  • See how the option goes over with users, continue improving the ML, etc. If everything looks good, eventually promote to a default, expand, and so on.

Perhaps the early steps could be done by an EA-hobbyist, and if they looked promising than the later steps could be done by EA contractors funded by EA donations. I’d guess the cost of the whole thing would be on the order of $100k. In reality I expect you’d probably bail unless the early steps looked quite promising.

I would probably purchase certificates of impact for any steps that look like they had been competently executed, especially a demo of the ML with some realistic data set (but would also pay for just the dataset, or for a prototype implementation e.g. in a fork of the LW or reddit codebase).


(This section is probably not relevant in the near future, but might help explain why I’m enthusiastic about this overall approach. It also helps bridge the gap with my last post.)

In the very long run, I think that people should see the content that they would want to see, not the content that the moderator would want to see. Moderation is useful in two ways:

  • We want people to have a shared view of what is going on. In some cases it would be silly if everyone saw a different subset of the discussion.
  • We want people to get sensible results the first time they visit a forum, not after they’ve given hundreds of upvotes/downvotes.

In effectively threaded conversations, I think it’s fine for the ordering of non-responses to vary independently across different people, so the first issue isn’t serious.

In these cases, I suspect the correct behavior is to initially use on the moderator’s predicted judgments, and to shift towards using the user’s predicted judgments as we collect enough data to make good predictions.

We could also give users the option to switch between their personal view, the moderator’s view, this composite view, etc. (along with choosing different moderators, different “personalities” for the moderator…)

If the same prediction system were used many places on the internet, then this could be improved further. When someone visits a new site, their learning algorithm can predict what moderator+personality is the best predictor of the user’s judgments, and how strong the prior should be on the moderator’s judgment, and whether other adjustments would be helpful. This is kind of like meta-moderation, but in principle there is no distinction between the object and meta level: “predicted recommendation of the local moderator” is just another feature that can be used by the user’s personal learning algorithm.

A nice fact about personalization is that everyone can run their own learning algorithm locally and privately (e.g. as part of a browser extension, or the browser itself); there is no required centralization of the learning, just one API for providing model features and another API for using the predicted judgments to update the user interface.


I think that we could understand much more about organizing online discussion, and that this is a worthwhile problem for EAs to think about and work on. There seems to be a shortage of concrete technical proposals for how things could be much better, so I wanted to throw this one out there (though I hear there is at least one extensively-worked-out proposal that I haven’t seen).

I don’t think this is a uniquely good proposal, but I do think it is relatively simple and could be a significant improvement over the status quo.  It’s also tangled up with the general project of “making AI work for humans” which seems like it might be extremely important.

(discussion at LessWrong)


2 thoughts on “Crowdsourcing moderation without sacrificing quality

  1. You’re onto something amazing with your proposed model for moderation. I’ve been working on a similar project for the last year & one of the biggest problems facing operators is inevitably pass of control from moderator to community. Legally the platform operators are accountable for the system control & moderation. I don’t see any operators today taking on that risk even though the computer science community knows the role can be better performed by machines.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s