Thursday, July 30, 2009

information ii

Last time we considered the information in a single event and determined that a decent measure of that information might be:

I(e) = − log P(e)

Suppose, however, that we are not interested in the information in a single event, but in a source of information. We can think of this as a source of events, or a source generating signals. All that really matters is that there are a number of possibilities, and they appear with different frequencies. Let's use π to denote a source and pi to denote the probability of the ith outcome amongst the n possible outcomes π might generate.

What can we say about the information generated by this source? Well, suppose we want a measure of the average amount of information in any signal coming from this source. This could mean something like "How chaotic is this source?" Suppose we are looking at fluctuations in background radiation—are they the result of chance (i.e. chaotic) or are they remnants of communications from an alien civilization?

An equivalent interpretation is this: "How efficient is the coding of this source's signal?" If there is lots of redundancy, then there is less information in each event generated by a source. For example, if the source is generating letters of the alphabet and it generates "th" then we would not be surprised if the next symbol is "e"—if the source is generating English text, that is. This is because there is a lot of redundancy in written English. Whenever the probability associated with the next symbol drops below 1/26, i.e. whenever some symbols are more likely than others, we receive less information when the likely symbols occur.

A third interpretation here is this: "How predictable is the source?" Suppose the source is generated by the toss of a fair die and the symbol "x" is sent if the outcome is 1, 2, 3, 4, or 5, but the symbol "y" is sent if the outcome is 6. Then "x" should appear five times more frequently than "y" and we receive much less information whenever "x" occurs. So, if all we do is predict "x", we'll be right five times out of six.

So, a chaotic source is an efficiently coded source is an unpredictable source. But how can we put a measure on this property? Claude Shannon (1948) "The Mathematical Theory of Communication" argues that we need a function H which will satisfy four intuitive properties:

1. H should be continuous over the event probabilities—i.e. small changes in the probability distribution should only have small changes on H.

2. If the probability of all events generated by a source is equal (i.e. for n events, the probability of each is 1/n), then the value of H should increase as the number of possibilities increases (intuitively, there is more information in a die toss than a coin flip).

3. If the end results (the sequence of events) are generated by different probabilistic processes, but the ultimate probability of each symbol is the same, then H should return the same value in each case. For example, consider two sources, π1 and π2. π1 first flips a fair coin. If it comes up heads, it sends "a", if it comes up tails, then π1 rolls a fair die and sends "b" if the outcome if 1, 2, 3, or 4 and "c" if the outcome is 5 or 6. π2 just rolls a fair die and sense "a" if the outcome is 1, 2, or 3; "b" if the outcome if 4 or 5; and "c" if the outcome is 6. Although the signals are generated by different mechanisms, in both cases, "a" will appear with probability 1/2; "b" will appear with probability 1/3; and "c" will appear with probability 1/6. So, H should assign the same value to both π1 and π2.

Shannon proves that all functions which satisfy these three desiderata will be of the form:

H = − K ∑i=1n pi log pi

Where K is a constant (essentially, it sets an origin) and the basis of the log represents a choice of units.

Shannon called H the entropy of a source. This is because it happens to be the same equation as is used to calculate entropy in statistical mechanics (note, however, that the derivation of the equation is totally different). Is there any deep connection between entropy in information theory and entropy in statistical mechanics? This is an open question about which opinions differ wildly.

No comments: