### EDT vs CDT 2: conditioning on the impossible

In my last post I presented a basic argument for EDT, and a response to the most common counterarguments.

I omitted one important argument in favor of CDT—that EDT can involve conditioning on a measure zero event, yielding either undefined or undesirable behavior, while CDT is always well-defined. In this post I dive into that argument and explain why I don’t think it’s a good justification for CDT.

(ETA: while I wrote this post, Jessica Taylor took the more useful step of sketching a natural model of decision theory / epistemology that doesn’t have this problem. Her post supersedes some of the arguments here, if think that reflective oracles are a reasonable model of idealized agents-reasoning-about-agents.)

#### The problem with EDT

The problem occurs when an EDT agent assigns probability 0 to a statement of the form “EDT recommends action a in situation x.” Under these conditions, computing the expected utility E[U|A(x) = a] involves conditioning on a measure zero event and so is not constrained by usual rationality conditions.

In particular, an EDT agent can have self-fulfilling but absurd beliefs. The usual example in the rationalist crowd is the “5-and-10” problem:

• An agent is given the choice between outputting “5” or outputting “10.”
• If it outputs 5 it receives \$5. If it outputs 10 it receives \$10.
• An EDT agent could consistently believe: E[U|I output 5] = \$5, E[U|I output 10] = \$0, P(I output 5) = 1, P(I output 10) = 0. This would result in the agent outputting 5, and so none of its beliefs would be unreasonable.

Note that this problem only occurs when the agent assigns probability exactly 0 to a decision.

This pathology undermines EDT for two reasons:

• Pragmatic objection: If an EDT agent has pathological beliefs, it will get a lower payoff.
• Aesthetic objection: this pathology suggests that something deeper is wrong with EDT.

#### Outline of counterarguments

In this post I’ll argue:

1. An agent having perfect confidence about its own output should be quite rare—literal 100% confidence should be rare in general, and if anything this special case should be even more rare. (There is a common but mistaken intuition that 100% confidence should be normal.)
2. CDT is in fact vulnerable to a similar pragmatic objection. This undermines the pragmatic objection to EDT, and also helps illustrate the weirdness of the situation where an agent is certain about its own behavior (partly undermining the aesthetic objection).
3. Once an agent becomes certain about what decision it will make, it seems fair to regard the decision as “already made.” Once the agent occupies the unfortunate state of having already made a bad decision, we shouldn’t be surprised that EDT fails. Indeed, the situation is quite analogous to transparent Newcomb. As usual the right response seems to be shifting towards UDT rather than CDT.
4. If conditioning on measure 0 events is a technical problem for our epistemology, then EDT can fix the problem by ensuring that its recommendations are unpredictable to itself. This fix is somewhat unsatisfying, but is nevertheless more justifiable (and pragmatically superior) to the CDT fix. Of course this results in situations where the EDT agent is almost certain about its own behavior.
5. If an EDT agent becomes almost sure about its behavior then it can become very similar to CDT and for example can two-box on some Newcomb-like problems. This is an unfortunate feature of EDT (relative to UDT), but given that the worst case is “behaves like CDT” I don’t think it should be taken as a pro-CDT argument.

Overall, I find these counterarguments reasonably persuasive, and so I don’t regard the “conditioning on an impossibility” argument against EDT as very strong. This is a question I haven’t really considered in detail since 2012 though, and in general I normally use more updateless decision theories for which this is a much smaller issue, so I’m not confident about this.

I believe the original post was too dismissive of this objection, and more generally that the existence of confusing open problems means we should have more uncertainty about decision theory. However, I still think the balance of arguments we understand today is relatively unambiguous. So I consider it quite plausible that we’d reject EDT/UDT on reflection, but I would be pretty surprised if we’d accept CDT instead.

### More detail

#### 1. Having perfect confidence about your own decision is weird

Suppose I’m faced with the situation x, and am deciding what action to take, i.e. I’m running the program A(x). The reason I’m running this program is because I want to figure out the value A(x), so that I can take that action. It would be weird if I already knew A(x) before running the computation—why bother running the computation then? (In these settings I think it’s fair to describe the decision as “already made,” discussed in the next section.)

On the other hand, considering the following argument that this situation should be very common:

We can easily compute what EDT does in certain situations—for example, we know that EDT one-boxes in Newcomb’s problem. Indeed, predictably making the “right” decision is part of the intuitive appeal of EDT! Since we can compute these facts, the EDT agent can also compute these facts, and so in “easy” situations it should frequently be certain about its own decision (just as we are certain about its decision.)

I think this intuitive arguments makes two subtle mistakes

First, for an EDT agent, even a simple decision dilemma is not defined entirely by the causal structure of the situation—it depends on all of the facts the EDT agent knows. There is no single “Newcomb’s problem” for an EDT agent, the nature of the dilemma changes depending on facts the agent has discovered prior to entering the dilemma. In particular, merely thinking in more detail about Newcomb’s problem suffices to change the dilemma. So the behavior of EDT can be subtle even in “simple” dilemmas. (This is related to the observation in the previous post that CDT yields “simpler” computations.)

Second, it’s a mistake to think that a reasonable EDT agent could have perfect self-confidence. The EDT decision isn’t based on the output of some abstract idealization of EDT, it’s based on a particular instantiation of EDT with a particular epistemic procedure for assessing the truth or probability of claims. If the agent makes an epistemic error, then it will arrive at a bad decision, and so if the EDT agent has less-than-perfect self-confidence then it will be uncertain.

I claim that any reasonable epistemic procedure should be expected to always have less-than-perfect self-confidence:

• Pragmatically, it’s completely unclear how we would arrive at perfect self-confidence. Repeated introspection may give us very high self-confidence, but such empirical evidence will never provide perfect self-confidence.
• It looks like perfect self-confidence is in fact impossible for Godelian reasons—an agent that somehow acquired perfect self-confidence would immediately go crazy once it started making predictions about its own future behavior. So if for some reason we had perfect self-confidence, EDT would be the least of our problems. Moreover, the cases that cause an EDT agent to fail are very similar to the Godelian cases that cause problems, so this strikes me as a legitimate defense.

Noticing our confusion about these topics, you could object that agents reasoning about their own reasoning is currently poorly understood, and that EDT should be discarded because it depends on such unformalized reasoning. (Our best current formalizations are reflective oracles and logical inductors.) But I think this is specious—whether you use EDT or CDT, most realistic decision problems involve reasoning about the reasoning process of other agents, and so we don’t really have any choice but to make our best effort.

It’s possible for an EDT agent to have perfect confidence about its own decision without understanding why it made that decision, and in particular without perfect epistemic self-confidence. For example, in Newcomb’s problem the (perfectly trusted) predictor could simply tell the EDT agent what it is going to do in advance. Upon hearing that it is definitely going to 2-box, the EDT might well 2-box. This situation is discussed in section 3, where I argue it’s basically the same as other cases where EDT or CDT regrets learning some information.

##### Does EDT even do well in Newcomb-like cases then?

You might think that the EDT agent’s uncertainty about its own behavior would effectively behave like randomization—if its epistemic procedure goes crazy then it behaves somewhat chaotically, and when it conditions on making a weird decision it assumes that its decision went crazy. This looks like it might result in CDT-like decisions for Newcomb’s problem. But the behavior only looks random for agents that are weaker than EDT—if we have a powerful predictor, that predictor can tell whether EDT isn’t going to go crazy on this decision and therefore can make a correct prediction.

In the case where you have a predictor who can’t tell whether the EDT agent is going to go off the rails, then the EDT agent does behave like CDT. So against such predictors, EDT might two-box and receive a bad payoff. Of course, once we are uncertain about whether the predictor is good enough to cause the EDT agent to one-box, now we are back to the good case for EDT, where it has legitimate uncertainty that can be used to behave normally.

#### 2. CDT is vulnerable to a similar pragmatic objection

I’ll argue that CDT also chokes when it is 100% sure that it won’t take action a. This suggests that this (weird) situation ought to be considered an open problem in decision theory broadly rather than a distinctive failure of EDT.

In order to decide what to do, the CDT agent considers the expected utility E[U|do(A=a)].

In this counterfactual, the CDT agent will immediately observe that it has decided to take action A=a. At this point it conditions on the event “I took action A=a.” But that event has zero probability.

The resulting beliefs of the CDT agent become arbitrary, in the same way that the EDT agent’s beliefs becomes arbitrary. The resulting behavior is also arbitrary, and if this behavior would be bad then the CDT agent’s pathological beliefs become self-reenforcing.

You could patch this problem in a few ways:

• The CDT agent could fail to update on its own past behavior. Of course this fix only works if the CDT agent is the only thing in the environment that it cares about—if for example it is cooperating with someone, that person will presumably update on their behavior and so could behave arbitrarily badly. More generally, surgically modifying CDT to avoid updating on observations about the world is weird and poorly motivated (in addition to making the agent perform worse).
• When performing causal surgery, the CDT agent could intervene on all future actions rather than merely the next action, thereby making its own beliefs irrelevant. This is more aesthetically appealing, but has the same difficulty as the previous fix if the agent is e.g. collaborating with peers. It also results in a bad (and incompletely specified) decision procedure in the single-agent case, where the CDT agent either fails to account for its own computational limitations in the future or else fails to account for new computations that it will perform in the future.
• The CDT agent could entertain various skeptical hypotheses, in which e.g. it is in a dream, it is deeply mistaken about the laws of physics, etc.. Perhaps the behavior in these skeptical hypotheses remains sensible, so that we don’t need to worry. But in general it’s not clear why it would do anything good, and if we are willing to allow this kind of skeptical hypothesis then EDT could resolve its conditioning-on-impossibility problem in a similar way.

Overall, I think these cases present a pragmatic problem for CDT. If the agent is making a single big decision, then the pragmatic objection to EDT is stronger. But in realistic cases an agent makes a large number of decisions and individual decisions are not too important relative to the sum of all future decisions, and so in practice the two objections seem comparably severe.

#### 3. If an agent is 100% certain that it’s going to make a bad decision, the damage is already done

I don’t think we have much reason to expect agents to have perfect advance knowledge about their own decisions, but we can imagine it happening. For example, I might encounter a trusted predictor who says “After hearing this prediction, you will do A.” (Though it depends on the details of my decision procedure whether or not there is any A for which that prophecy would actually be self-fulfilling.)

How worried should we be about the fact that EDT can make the wrong decision, after becoming certain that it will make a bad decision?

As an analogy, consider the transparent Newcomb problem:

1. You walk into a room, and a trusted+reliable predictor tells you whether they expect you to pay in step 3.
2. If they predict you’ll pay regardless of what happened in step 1, they give you \$100. Otherwise they give you \$1.
3. In either case, you now have the opportunity to pay \$10.

Once EDT sees how much money it receives, its only remaining uncertainty is about whether or not it will pay. So the EDT agent never pays, and always receives \$1.

In this case I think there is general agreement that UDT is the “right” way to get a good outcome, and the only question is whether it is rational to pay up once you enter the situation or if it is merely rational to commit to paying up while considering the situation in advance.

Some EDT agents may have similar self-fulfilling loops in more absurd cases (e.g. 5-and-10), but structurally the situation seems the same: once the EDT agent gets told that it is going to make a bad decision by a perfectly trusted predictor, the EDT agent in fact makes a bad decision and regrets having been told the fact. (In fact an EDT or CDT agent can regret hearing all kinds of information, there is nothing special about this case.)

Exactly the same thing happens if the EDT agent becomes certain about its own behavior in some other way.

Framed this way, it seems clear that if I tell an agent it’s going to make a mistake, they would be better for the agent if it doesn’t believe me. That is, the agent would be better off if it ensures that it’s the kind of agent for whom bad prophecies aren’t self-fulfilling (rather than being in the default state, where we have no idea what kinds of prophecies are or aren’t self-fulfilling).

But this is the same as all kinds of other pragmatic objections to EDT—an EDT agent would be better off if it ensures that it’s the kind of agent who doesn’t pay up to blackmail, or the kind of agent who continues to pay up in Newcomb’s problem even when it gets told a good prediction about whether it will pay up.

If we were willing to bite the bullet on transparent Newcomb and end up not receiving \$100, then I think we should also be willing to bite this bullet and sometimes make a bad decision when we become certain that we are going to make a bad decision. Conversely, if transparent Newcomb motivates us to be updateless, then we would also be updateless when learning about our own behavior, and so we would try to retreat to a prior epistemic state where we are ignorant about our own decision, thereby avoiding the problem (assuming that we can find a suitably ignorant prior position from which to make the decision).

#### 4. Conditioning on measure 0 events isn’t a serious technical problem

So far I’ve argued about whether we should be worried that EDT behaves poorly when it becomes certain about its own decision. You could instead be concerned about a more technical problem: that an EDT agent simply fails on some inputs, and so isn’t even a candidate for a legitimate decision procedure. I’ll set aside the question of whether this is a valid style of objection, and simply argue that even if this is a technical problem, it is quite easy to fix and the resulting fix is no more unnatural than CDT itself.

To fix this problem, we can have the agent behave as follows (which the LW crowd calls “playing chicken with the universe”):

• If you become 100% certain that you won’t take action X, take action X.
• Otherwise use EDT.

In fact the first of these clauses will never occur, unless you are both dogmatic and wrong. So from the in-the-moment perspective adding the first clause doesn’t change the desirability of the procedure. Of course stepping back from the decision, the existence of the first clause can make the situation better or worse. (Though if you are concerned that EDT behaves badly on inputs where it becomes certain about its own behavior, then you should expect it to make things better.)

Including this term puts us in basically the same situation discussed in section 1 where we are guaranteed to be uncertain about our own decision, though the uncertainty may only be about whether the “chicken clause” fires. For the reasons discussed in that section, the explicit chicken clause is probably unnecessary (though it depends on what the agent would choose to do if it entered an inconsistent epistemic state).

If this term is necessary in order to avoid conditioning-on-impossibilities, a CDT-proponent might argue:

Sure, there is an “EDT-like” procedure that is immune to conditioning-on-impossibilities, but that procedure is no longer natural, so it’s on weaker philosophical footing than EDT.

I agree that adding the chicken rule puts EDT on weaker philosophical footing, since it makes the procedure more complex and we don’t have any strong argument for why we should do it once actually faced with a decision.

However, aside from the human intuition in favor of causal intervention, the chicken rule is no stranger than the arbitrary causal surgery required by CDT (and it’s certainly a simpler procedure). Moreover, the positive argument for EDT still applies to EDT-with-chicken—in the case that always actually happens, where the chicken rule doesn’t fire, EDT is still correctly capturing the decision while CDT is doing something arbitrary.

I think that the main reason to prefer CDT over EDT-with-chicken is the same as the main reason to prefer CDT over EDT—namely the intuition that causal interventions are what matter. But those intuitions are debunked for the same reasons as intuitions about CDT vs. EDT. (In fact it’s even starker: evolution never encountered the situation where an agent had 100% confidence about its own decision, and that case exhibits completely different behavior than 99.999% confidence, so intuitions produced by evolution should be particularly meaningless for evaluating arguments about such cases.)

#### 5. In bad cases EDT degrades to CDT, but that’s no argument for CDT

Some of the arguments I’ve made would imply that EDT agents can’t be perfectly confident about their own decisions. But there can still be situations where they are very confident about their decisions, and their remaining uncertainty is kind of weird.

In some unfortunate situations this could result in CDT-like behavior. I regard this as a negative feature of EDT, because I think CDT is often straightforwardly wrong. But I don’t think you can view “EDT sometimes does the same thing as CDT” as an argument for CDT!