## Monday, November 26, 2007

### probability and public policy III: bayes' rule

[part one of this series here]

So, I have a subjective belief about what will happen when a coin is tossed; now, how can I adjust this belief in light of evidence (say, after I've flipped the coin 100 times)?

When we discuss probability, we often do so conditionally. For example, we claim "the probability that the coin will come up heads is 1/2, given that the coin is fair." We write the probability that A is true given that B is true as P(A|B). A basic fact of the probability calculus relates the probability of A&B to the probability of B conditional on A:
P(A)P(B|A) = P(A&B)

But "and" is a symmetric relation, i.e. A&B if and only if B&A, so

P(A)P(B|A) = P(A&B) = P(B&A) = P(B)P(A|B)

and after dividing both sides by P(A), we get Bayes' Rule:

P(B|A) = P(B)P(A|B)/P(A)

OK, but what does this mean? Bayes' Rule tells us how to update our subjective belief state in light of new evidence. To see this, replace B by H for "hypothesis" and A by E for "evidence":

P(H|E) = P(H)P(E|H)/P(E)

What Bayes' Rule now tells us is that our probability that a hypothesis H is true given that we receive evidence E is just equal to our prior probability that H is true times the probability that we would receive evidence E if hypothesis H were true, divided by the probability of E (usually found by summing over the weighted conditional possibilities given all potential hypotheses).

Updating by Bayes' Rule ensures that one's probability distribution is always consistent. It is important to note, however, that one's conclusion about the probability of hypothesis H given evidence E depends upon one's prior assignment of probabilities. Of course, your belief state will eventually converge to the "actual" probability: so, if you believe the coin to be fair, but it is flipped 100 times and every toss comes up heads, your belief in the probability that the coin is fair will be very low in light of this evidence.

To illustrate how belief update occurs, consider a contrived example. A stubborn, but rational, man, Smith, thinks it is extremely unlikely that cigarette smoking causes lung cancer. For Smith, say, P(cigs cause cancer) = 0.2. Instead, he licenses only one alternative hypothesis: that severe allergies cause cancer. Since these hypotheses are exhaustive, on pain of inconsistency, Smith must believe P(allergies cause cancer) = 0.8.

Now, suppose Smith's Aunt Liz dies of lung cancer. Furthermore, suppose Aunt Liz has been a heavy smoker her entire life, then P(Liz gets cancer | cigs cause cancer) = 1 (certainty). Suppose, also, that Liz has had minor allergies for most of her life; since these allergies are only minor, let's say the probability she gets cancer under the hypothesis that severe allergies cause cancer is only 0.5.

Briefly, how should we calculate P(E) here? We sum over the weighted possibilities:

P(E) = P(H1)P(E|H1) + P(H2)P(E|H2) = 0.2(1) + 0.8(0.5) = 0.6

So, now we can use Bayes' Rule to calculate Smith's (only consistent) subjective degree of belief in the hypothesis that cigarettes cause cancer given the evidence that Aunt Liz has died of cancer.

P(H=cigs cause cancer) = 0.2
P(E=Liz gets cancer | H=cigs cause cancer) = 1
P(E=Liz gets cancer) = 0.6

Plugging these values into Bayes' Rule we get:

P(H|E) = P(H)P(E|H)/P(E) = 0.2(1) / 0.6 = 1/3

So, in light of this evidence, Smith's belief in the hypothesis that cigarettes cause cancer has increased from 1/5 to 1/3. Two important points to note here: i) The probability calculus only tells us how to update prior beliefs consistently, it does not tell us what belief state to start from; given sufficiently different prior probability distributions, two agents may draw dramatically different conclusions from the same data. ii) Notice that our calculation of P(E) depended upon the space of hypotheses we were considering. If an agent has failed to consider the actual cause of a piece of evidence E as a potential cause, he may perceive E as increasing the probability of a spurious hypothesis. (Suppose, for example, that it is actually a particular gene which causes both a tendency to smoke and a tendency to succumb to cancer, then cigarettes will be a decent predictor of cancer (supposing sufficiently few non-gene-carriers smoke), but not the cause of cancer - nevertheless, the hypothesis that cigarettes cause cancer will be supported by the data in this alternate scenario if the agent does not license this additional possibility.)

Given these caveats, however, we can see how a sufficiently large number of pieces of evidence that are not adequately supported by alternate hypotheses will eventually push an agent's belief state toward the "true" conclusion: if Smith sees enough people get cancer who don't experience severe allergies, he will come to assign a high probability to the possibility that cigarettes cause cancer.

next: frequentism