Optimizing the news feed

Humans publish trillions of words each year, yet any given human is going to read only a few of them. How should they choose which ones?

This has long been a tricky and interesting question. But it’s much more fun now that society is basically forced to write an algorithm to answer it.

I like this topic because it is a relatable allegory for value alignment. Conveniently, it’s also an issue du jour on the left, from fake news on Facebook, to filter bubbles, to clickbait headlines, to the public’s preference for shitty journalism. (It also gets play on the right, though the flavor is a little bit different.) How can I resist joining the fray?

I’m just going to talk about the Facebook news feed, because it is such a nice example. That said, I suspect that the general trend towards codification will accelerate and the stakes will rise; by 2025, we might be having a similar debate about the algorithm for deciding whom you should talk to.

If you just want the punchline, you can skip ahead.

Contents: I.Optimization, II. Clicks, III. Paternalism, IV. Motivation, V. Tools, VI. Questions, VII. Putting it all together.

I: Optimization

There are tons of features Facebook can use to decide what to display, and the “right” policy is going to be extremely complicated. This is unavoidable.

If we want to talk about Facebook’s algorithm, I think we should divide the question into two parts: deciding what to optimize, and then figuring out how to optimize it.

This is obvious in the context of machine learning. Machines are radically better than humans at looking over a very large number of features and drawing statistical generalizations about them. So if you want to build a powerful classifier or ranker, it makes a lot of sense to specify the objective, and then use machines to find a policy that optimizes the objective.

But I think that the same thing is true at the level of the organization. If Facebook decides on a goal, then they can run A/B tests to decide whether a change is an improvement; they can evaluate teams’ contributions according to how much they advance the goal; they can talk clearly about their goals and work more effectively.

And even at the level of our civilization, I think that splitting goals from means makes this conversation roughly 100% more coherent and productive. If we want to talk about whether Facebook is “doing the right thing” we don’t want to talk about a complicated heuristic involving a thousand features; we want to talk about whether they are optimizing the right thing.

So now our question becomes: what should Facebook be optimizing? In some contractarian sense the answer is “the net present value of Facebook equity,” but for now I’ll set that aside. I’m going to ask:

In order to be a good global citizen, what should Facebook be optimizing?

II: Clicks

Facebook probably currently optimizes for a combination of clicks/like/shares/comments—I’ll summarize these indicators as “clicks.”

At face value this may look inevitable. Facebook gets huge amounts of data about clicks, and so they are incredibly easy to optimize. If you ask Facebook to optimize something other than clicks, then it has a harder job.

For example, suppose that I wanted to maximize reported user satisfaction per visit to Facebook. At face value this looks like it must be much less efficient, since “collect a detailed survey” is so much more expensive than “see what the users click on.”

But we can get around this problem by performing the learning in steps:

Collect expensive user satisfaction surveys for a small number of random users.
Learn to predict user satisfaction given a posts and its clicks. For example, we might learn that users are more likely to be satisfied if they liked a post, but that certain spammy features are anti-correlated with satisfaction even conditioned on liking. This model is trained with a small amount of data, so needs to be relatively simple.
Learn to predict clicks given a post—this is probably most of what Facebook currently does.
Given a post: use the model from #3 to predict clicks, then use the model from #2 to predict satisfaction.

(More precisely: we would do better by training a predictor to directly predict satisfaction from posts, using the predictor from step #2 for variance reduction, as described in this post. This problem is an instance of semi-supervised learning, and the RL version is semi-supervised RL.)

We could go further and add intermediate steps. For example, we might use clicks to predict a simple survey and then use a simple survey to predict a complex survey. We could have a long sequence of increasingly expensive indicators, culminating with the very expensive indicator that we are actually optimizing.

A similar phenomenon can occur at the organizational level. For example, Facebook could decide that their real goal is an expensive metric X, and then run small studies to try to figure out which cheap proxies Y accurately predict X. (Which they already do to some extent.)

We can also do the same thing at the level of the cultural debate about Facebook, breaking our question down into: what expensive metric should Facebook be optimizing, and how well are they actually approximating that metric?

III: Paternalism

A natural goal is to “give the user what they want.”

We could strengthen this to “give the user what they would want upon reflection.” (If “reflection” and “average Facebook user” sound incongruous, you can skip ahead to the next section. For now I’m not going to talk about it.)

One objection is that users aren’t especially enlightened. They prefer to feel vindicated than to explore new perspectives. They can’t really tell what is true or important. If left to their own devices, they will just wander further and further into epistemic hell.

However, I don’t think that Facebook should try to “help” users who have “bad” preferences or views that are hard to change with reflection.

With respect to any of “clickbait,” “fake news,” “filter bubbles,” “crappy journalism”, I think it’s worth distinguishing two situations:

There is some process of reflection that they could go through, which (a) they would endorse as a useful/valid/non-manipulative form of reflection, and (b) would lead them to decide that they should take the “epistemically virtuous” route.
There isn’t.

In case #1, we can make everyone happier by giving the user what they want, but encouraging them + giving them the resources to do the appropriate kind of reflection before passing judgment.

I think that case #1 captures an awful lot of what we would consider the “problematic” aspects of a Facebook news feed, for many users (and for most of the users who have any chance of not descending into epistemic hell):

For clickbait, simply looking at the article is plenty of reflection to decide that it’s not good. Clickbait is mostly a problem when decisions are based on clicks rather than ex post evaluations.
For false or misleading claims, consulting a fact-checker or reputable source often suffices to discover that the claim is false/misleading, and many readers would prefer not read false or misleading claims.
For filter bubbles, during reflection users can look at how the content in their feed compares to other content being written on the internet, and then decide whether they are OK with that or would prefer see a greater diversity of content (even if they would be less likely to click on it).
For “good” journalism, it’s actually a really hard problem to say what people “should” be reading. Generating joy is a primary purpose of leisure time. (And ad revenue is probably not the right way to fund real investigative journalism that you want to e.g. have a political effect.) I think that the “what do they want on reflection” standard is as good as anything.

That said, we can still end up in case #2 a lot of the time. For example, I want to be in a filter bubble with respect to many issues, because in fact most of the content on the internet is poorly argued garbage. I don’t think this would change upon reflection (parts of it might, but probably not the overall sentiment).

Some people who like poorly argued garbage may feel the same way in reverse. I suspect that some of the disagreement would evaporate with modest reflection, and they would end up at some position that I’d consider way-less-terrible but still-not-great (or maybe in some cases I’d agree that it’s not such garbage after all). To the extent that these other people disagree with me even upon reflection, there is a certain symmetry in our positions.

Giving readers what they want-on-reflection is a kind of compromise. It can make everyone’s views better, to everyone’s benefit. But if you have a particular perspective or believe that your epistemic norms are better, you could make things better still by pushing your perspective or norms on others—at the expense of pissing off people who think that the elite establishment is suppressing the views of the masses (which, in fairness, would be a correct judgment).

It may not feel like there is any need to compromise. After all, the press, the tech industry, and the public intellectuals all tend to share a set of common views. And those views are probably correct (says bay area techie to other bay area techies). Do we really need to compromise with the average reader?

One issue: ultimately people get to choose what they read. For example, if you try to push readers who hate the “mainstream media” towards the mainstream media, they may get annoyed and shift towards other news sources (I think not-entirely-without-justification). That’s obviously bad for Facebook, but it’s also bad for everyone. In the interest of preventing filter bubbles we could inadvertently make them much worse.

A second issue: even when you are the ones with the power in a domain, I think it is best to respectful to the wishes of people who lack influence. That’s part of living in a respectful and tolerant society, and it also makes everyone better off over the long run.

(Scott Alexander has written a lot on this topic, see e.g. Guided by the Beauty of Our Weapons.)

IV: Motivation

I’ve talked somewhat loosely about what people would want “upon reflection.” A natural question is: can you actually make users reflect?

I think so. The most important point is that we can use a sequence of increasingly accurate approximations: clicks, in-app follow-up, surveys, interviews, and so on. The most expensive approximations could be used extremely rarely (think ~1/million of Facebook users / day). So it’s totally fine if getting users to reflect in an extensive way is very expensive or challenging.

It seems economically and technically feasible to offer hundreds of dollars of incentives and assistance to each person who you want to get “ground truth” data from, and quite a bit more to high-income users (e.g. the median American). That sounds unrealistic for other reasons, but hopefully it gives a sense of the resources we have to work with here.

The best existing analogy is probably jury duty. Facebook would probably have to make due with substantially lower levels of detail (e.g. since they might have a harder time convincing employers to give people time off). But again, hopefully it gives some intuition for the scale.

In addition to generous financial incentives for effort, there are other salient reasons to do a good job of deciding what you want:

The decision directly influences what content you’ll see in the future—if you do something stupid, Facebook will be more likely to show you stupid content in the future. (Also some acausal effects on what you saw in the past, which I wouldn’t expect people to care about as much.)
Your decision influences what people like you will see in the future. Many people care about their friends seeing good things.
It affects the overall quality of content and discourse on Facebook, which many people care about modestly.

Overall, I think it’s pretty likely that people could be induced to (sometimes) spend significant amounts of time thinking about what content they would want to see.

If you are able to give a user the time and resources to reflect on what they want to see, and to give them reasonable incentives, and they would prefer run with their gut—well, then they get what they have coming. I don’t think most users will act that way, and it would be interesting to make some bets about this.

V: Tools

So far I’ve treated “reflection” as a black box—time goes in, better judgments come out. Alas, that is not quite how it works.

So: how should we organize this reflection? The rough plan is to give the user a bunch of time and resources and see what they decide, but “resources” hides a lot of details.

For example, we may want to provide some guidance to the user, to show them some relevant considerations, to call key facts to their attention. How do we decide what to tell them? If this process is biased, then the entire exercise will be biased. If this process is unhelpful, then the reflection may not go anywhere.

As another example, sometimes the user might want to fact-check a particular claim, a short summary which rebuts that which can be rebutted, clarifies that which can be clarified, etc. How should such a fact-checker work?

In practice we would just do our best, at least to start. We’d punt questions to Google, and use the fact checkers that have good page rank. Our friends would write and share posts on “how to decide what to see on Facebook,” and we’d look at those posts to see what they recommend if the case was tricky. And so on. Realistically I think that would work just fine.

But my heart is always in the long game. I really want to know: what should we be aiming for, growing towards, aspiring to?

I think that the “right” answer is to use a slightly-scaled-back version of the exact same system. What advice do you show the user? The advice the user would want to see, upon reflection. Which fact-checker should they use? The one that they would want to use, upon reflection.

This is a lot like what you get automatically if you have people use Facebook to decide how to evaluate claims. Or if they use Google, but Google uses the same algorithm to decide “what search results should we display?” So in some sense I think this is a natural limit of a society where more and more stuff tools are optimized for “what the user would want upon reflection.”

Of course reality is a lot messier than that. But nevertheless I think this basic idea is a realistic way for technology to trend, and I sure hope that’s how it goes, because the realistic alternatives seem a lot uglier.

VI: Questions

What question do you actually want to ask the user?

In principle the data I’d want from users is “which of these two possible news feeds is better?” This can then be used to optimize the news feed for the user’s preferences (even if they aren’t consistent).

The most useful comparisons are probably between very similar news feeds e.g. with a single story changed, or a single transposition. I expect that Facebook’s optimization basically only uses this kind of per-story data, making mostly-independent decisions for each story.

If Facebook did any kind of more global optimization then they may sometimes need more global comparisons (e.g. if people prefer consistent fonts in their news feeds, then Facebook would want to compare news feeds with font A to news feeds with font B—it wouldn’t be sufficient to do the comparisons story by story). In general, the desired comparisons depend on how Facebook is doing their optimization.

You probably also want to make more than just a A vs. B decision—sometimes the decision is a slam-dunk, sometimes it’s basically a toss-up, and those should be treated differently. I discuss this and many similar issues in a recent post.

VII: Putting it all together

A very small fraction of users are selected to provide ground truth data; they may be selected randomly, or by some more sophisticated active learning algorithm. Other users give cheaper proxy measures, e.g. quick in-app surveys about news feed satisfaction. The auxiliary data is used, along with clicks, to help Facebook predict how people would respond if they were selected to provide expensive ground truth.

When a user is chosen for the full deal, they are given a pair of (snapshots of) news feeds and asked to evaluate which they would prefer to see (and perhaps by how much). These news feeds are probably very similar, perhaps different in a single story.

They may be given a sequence of comparisons over the course of a few hours or a day, or perhaps just one. They probably receive some financial compensation, which may depend on the time they spent thinking or on some other measure of effort (as well as on their financial value as Facebook customers).

They may be given some introductory text by Facebook to help orient them. To decide amongst different introductions, Facebook could use a scaled-down version of this whole procedure to predict which introduction the user would prefer.

In order to make their judgment the user can consult Google or Facebook, can talk with friends, and so on. Facebook may help connect them with experts (or give them a badge that can be used to prove to an expert that they are ground-truthing for Facebook, which might motivate some experts to spend more time helping them). Facebook may directly provide whatever help it can.

This process could be split over a few days or weeks if some communication is required, or if the user would like to “wait and see” what happens in the external world over the coming days (e.g. some users might prefer to see posts whose predictions turn out to be correct, or might want to stew on a post for a while before deciding how good it was).

Conclusion

This post was definitely a bit silly, but I feel strongly about the basic underlying mechanisms and arguments. I think that as a society we need to eventually sort these ideas out.

I care most about these mechanisms in the context of powerful AI, but I do think they are relevant now. I also expect Facebook and similar products to move gradually in this direction over the coming decade. If they don’t, I think that’s probably a bad sign.

(discussion at LessWrong)

The sideways view

Looking askance at reality