Sunday, December 30, 2007


. . . . the stamp of maturity?

Saturday, December 29, 2007


Whist is, without question, the best of all our domestic games. The only other one which could lay claim to such a distinction is Chess; but this has the disadvantage of containing no element of chance in its composition—which renders it too severe a mental labour, and disqualifies it from being considered a game, in the proper sense of the word. Whist, on the contrary, while it is equal to chess in its demands on the intellect and skill of the player, involves so much chance as to give relief to the mental energies, and thus to promote, as every good game should, the amusement and relaxation of those engaged.

from William Pole, F.R.S., The Theory of the Modern Scientific Game of Whist, 1883

Friday, December 28, 2007


We live in an age of disinformation, confusion, and mystery. Even within a single report, we see contradiction:

apparently, "there were no bullet marks on Bhutto's body"; nevertheless, "she was shot" . . . .

. . . . ?

Tuesday, December 25, 2007

sad clown christmas


. . . on the mind each holiday season, when many set foot inside a church for the first time in 364 days. Neal Stephenson argues that the elevation of hypocrisy from minor foible to cardinal sin is a product of the rampant relativism of the 21st century. In an age where there are no moral absolutes, the only standpoint from which one can criticize another is their own, i.e. only hypocrisy, failing by one's own lights, moral self-contradiction, remains an absolute sin.

Even if this observation is accurate, however, the question remains: pragmatically, is hypocrisy useful to society / humanity or no? Perhaps, more generally: can cognitive dissonance be a precipitous means to constructive ends? Is this an empirical question?

Monday, December 17, 2007

understanding the mind VII

from Jean-Pierre Changeux, The Physiology of Truth, 2002

Friday, December 14, 2007

an alternate perspective

We should contrast our reluctance to accept the accidents of history as arbiter of obscenity with the more thoroughgoing stubbornness characteristic of full-fledged Saxonism:
Saxonism is a name for the attempt to raise the proportion borne by the originally & etymologically English words in our speech to those that come from alien sources. The Saxonist forms new derivatives from English words to displace established words of similar meaning but Latin descent; revives obsolete or archaic English words for the same purpose; allows the genealogy of words to decide for him which is the better of two synonyms. . . . The truth is perhaps that conscious deliberate Saxonism is folly, that the choice or rejection of particular words should depend not on their descent but on considerations of expressiveness, intelligibility, brevity, euphony, or ease of handling, & yet that any writer who becomes aware that the Saxon or native English element in what he writes is small will do well to take the fact as a danger-signal. But the way to act on that signal is not to translate his Romance words into Saxon ones; it is to avoid abstract & roundabout & bookish phrasing whenever the nature of the thing to be said does not require it.

H. W. Fowler, A Dictionary of Modern English Usage, 1926.
There was a minor craze for [Saxon words] early in this century, giving us all manner of quaint pseudo-archaisms like skysill for horizon, but it has passed, and with it any notion of a special virtue inherent in 'native' roots. It remains broadly true that, as compared with derivatives of Latin, a decent proportion of Saxonisms in the vocabulary is a sign of a good writer, but the reader should never be allowed to suspect that this is the result of any conscious policy of choice on the writer's part. What 'a decent proportion' amounts to cannot be defined, and it seems easier and safer to approach the problem from the other end and work on the principle that a preponderance of classically derived words in what one writes, especially words denoting abstract qualities or things, especially polysyllables, especailly those ending in -tion or -sion, is a bad sign. That is Rule 1.

Rule 2 annoyingly goes back a little way and says, Never choose to write one word rather than another on the sole ground that it has an Old or Middle English pedigree and its competitor comes from a Latin, French, anyway non-English root. In particular, never choose an English-descended word like forebear when a foreign one like ancestor seems more familiar and natural.

Kingsley Amis, The King's English: A Guide to Modern Usage, 1998.

So, "avoid abstract & roundabout & bookish," but prefer "familiar and natural" terms. Yet what of questions of propriety and obscenity? These seem wholly orthogonal to matters of clarity and elegance of expression.

Tuesday, December 11, 2007

probability and public policy VI: "priors" and the fair coin revisited

[part one of this series here]

A very real problem in public policy decisions is the role of the "prior," or the probability assignment before one has been presented with any evidence. Two politicians with very different priors, when presented with the same evidence, will come to very different conclusions. The beauty of subjective Bayes is that it gives us an analysis of how this phenomenon is a byproduct of rationality.

Consider, for example, the debacle over WMDs in Iraq. Many critics attribute irrationality to the policy makers who determined the probability that Iraq possessed WMDs high enough to justify an invasion. A disadvantage to this approach is that it rules out the prospects for dealing strategically with these policy makers (via debates, speeches, compromises, etc.) as all theories of strategic interaction presume the rationality of one's opponent. The subjective Bayes approach allows us to characterize the conclusions of these policy makers as rational given an appropriate assignment of priors.

Before discussing more realistic scenarios, let's examine a toy example. A coin has been tossed 4 times, with outcomes THTT (tails, heads, tails, tails). Now, consider 3 politicians:

Politician Q is a frequentist

Politician R is a Bayesian who believes strongly in hypothesis A, namely that P(H) = 1/2

Politician S is a Bayesian who believes strongly in hypothesis B, namely that P(H) = 1/100

When presented with the same data set, Q, R, and S will each come to different conclusions, respectively:

Politician Q will believe hypothesis C, namely that P(H) = 1/4

Politician R will continue to believe (at roughly the same strength) hypothesis A, namely that P(H) = 1/2

Politician S will continue to believe (at greatly reduced, but still more than 50% strength) hypothesis B, namely that P(H) = 1/100

[For the calculations and relevant simplifying assumptions, please see the appendix.]

Many simplifying assumptions were made here, but the essential point still stands: given suitably strong priors and suitably ambiguous evidence, rational policy makers can disagree.

What of our frequentist here? In this example, perhaps, he seems better off. However, we should not forget the conceptual problems associated with frequentism, especially the problem of one time probabilities. For example, consider a situation like "climate change": the prospects for running the relevant "experiment" repeatedly (letting industrial society evolve on earth an infinite number of times?) are nil, yet the need for some kind of conclusion is unavoidable. Returning to the case of WMDs, there is a similar situation: the relevant evidence does not allow for a "reading off" of the probability in as clear a manner as successive coin tosses.

Obviously, all relevant positions have been greatly simplified. The essential point to make here is that neither "science" nor "rationality" dictate the correct policy responses in the face of uncertainty. Furthermore, failure to acknowledge this point weakens one's position in the ensuing debate as one is left unable to strategically militate for one's own position (as one cannot model one's opponent as rational).

We turn next to some more realistic policy issues and the specific complications which arise in dealing with the relevant probabilities.

next: cancer

probability and public policy VI (appendix)

We present here the calculations relevant to the above argument

Politician Q is a frequentist, so he simply reads the probability for heads off the data = 1/4.

Politicians R and S are Bayesians, but in order to simplify the problem, we need to distinguish their hypotheses about the underlying probability from their degree of confidence in those hypotheses. Let us say that each politician judges the probability of their favorite hypothesis to be 9/10. Call politician R's hypothesis (that the coin is fair) A and politician S's hypothesis (that the coin is heavily biased towards Tails) B. Furthermore, we simplify by assuming that A and B are mutually exclusive, i.e. there are no other possible hypotheses under consideration for either politician (we revisit this assumption in the sequel).

A = hypothesis that P(H) = 1/2 (i.e. that the coin is fair)
B = hypothesis that P(H) = 1/100 (i.e. that coin is biased strongly against H)
D = THTT (our data)

Then, using subscripts R and S for the beliefs of the respective politicians, we set

PR(A) = 9/10, so
PR(B) = 1/10, and
PS(B) = 9/10, so
PS(A) = 1/10

It is important to note here that the sequence of heads and tails is completely irrelevant; this is what allows us to apply the binomial distribution to calculate P(D|A) and P(D|B) (which will hold independently of the relevant politician).

[apologies: due to the apparent incompatibility of blogger with html math tags, I will use n{choose}k as short hand for the standard notation]

Binomial distribution as a function of n = number trials, k = number positives (in this case Heads), p = P(H):


(n {choose} k) = n! / (k!(n-k)!)

so, for us,

(4 {choose} 1) = 4! / (1!(3)!) = 4

which gives for X ∈ {A, B}:
P(D|X) = 4P(X)1(1-P(X))3=4P(X)(1-P(X))3


P(D|A) = 4(1/2)(1/2)3= 1/4


P(D|B) = 4(1/100)(99/100)3=4(1/100)(970,299/1000000)= 3,881,196/100000000 ≈ (3.8x106)/(108) ≈ 4/100

In order to apply Bayes Rule, we also need the value of P(D). Unlike the conditional probabilities just calculated, P(D) will differ for each politician's probability distribution.

PR(D) = PR(A)PR(D|A) + PR(B)PR(D|B) = (9/10)(1/4) + (1/10)(4/100) = (9/40) + (4/1000) ≈ 1/4

PS(D) = PS(A)PS(D|A) + PS(B)PS(D|B) = (1/10)(1/4) + (9/10)(4/100) = (1/40) + (36/1000) = (25/1000) + (36/1000) ≈ 6/100

[excuse the rough rounding, but it won't change the qualitative result even if it introduces small inaccuracies]

Now we can simply apply Bayes' Rule to determine how strongly the politicians R and S will believe their respective favorite theories after presentation with the evidence D:

PR(A|D) = PR(D|A)PR(A)/PR(D) ≈ (1/4)(9/10)/(1/4) = 9/10

PS(B|D) = PS(D|B)PS(B)/PR(D) ≈ (4/100)(9/10)/(6/100) = (2/3)(9/10) = 6/10

It should be clear that we can increase politician S's certaint0y of his original conclusion in the face of this evidence by increasing his initial degree of belief in that conclusion. Note that even if we include more than 2 possible hypotheses, this conclusion still holds (as these hypotheses will be weighted so weakly for each agent as to not be effected by such a small amount of evidence). The point here is just (to reiterate) that given sparse or ambiguous evidence and sufficiently strong priors, it is quite possible for rational agents to disagree dramatically about the pertinent conclusion to draw from the data.

"it's really not a game, dog"

. . . wild as the Taliban, 9 in my right, 45 in my other hand . . .
~ T.I.

I put soap in my eye
Make it red so I look raa, ra ra
So I woke up with my holy quran and found out I like Cadillacs
We shooting till the song is up
Little boys are acting up and
Baby mothers are going crazy
And the leaders all around cracking up
We goat-rich, we fry
Price of living in a shanty town just seems very high
But we still like T.I.
But we still look fly
Dancing as we're shooting up
And looting just to get by.
~ M.I.A.

I'm like the second plane that made the Towers face off,
that shit that let you know it's really not a game, dog . . .
~ Mos Def

Switchblade, grenade, rhyme flows
Fuck niggaz like wild rhinos
Up in these killing fields you bound to die slow
Your style staggers like a drunken wino
That's why there's no hope to defeat a Black Knight
That's like tryin to walk a tight rope
with no feet, mercenary team, streets of concrete
Sasquatch thump a nigga ass, so why try the
Invincible, Dr. Destructor
My lyrics bring war like Lebanon
Our troupe's a Desert Storm, it be on son
Compton is the city where I come from
Act dumb if you want to, and catch a hot one
It's that real, knuckle up, lace your boots tight,
Don't give a fuck 'cause every night is our night
~ Dr. Doom

Trash icons, smash, spit bionic poems
Fuck bygones, rely on Islam and my python
Squeeze off, long fist, when I'm pissed
Result of this, gun powder cover my wrist,
black list . . .
~Killa Sin

Sunday, December 9, 2007

understanding the mind VI

from Marvin Minsky, The Society of Mind, 1988

Thursday, December 6, 2007

obscenity or euphemism?

The events of 1066 continue to effect modern usage, in particular the practice of forbidding the use of particular words of Saxon origin in the public sphere. If the preference for Latinate roots in modern English is merely a reflection of Norman hegemony, can we find alternate grounds for choosing our words which do not depend upon the accidents of conquest?
If we seek a purely pragmatic solution, we should perhaps heed the advice of Orwell, 1946:
Foreign words and expressions such as cul de sac, ancien regime, deus ex machina, mutatis mutandis, status quo, gleichschaltung, weltanschauung, are used to give an air of culture and elegance. Except for the useful abbreviations i.e., e.g., and etc., there is no real need for any of the hundreds of foreign phrases now current in the English language. Bad writers, and especially scientific, political, and sociological writers, are nearly always haunted by the notion that Latin or Greek words are grander than Saxon ones, and unnecessary words like expedite, ameliorate, predict, extraneous, deracinated, clandestine, subaqueous, and hundreds of others constantly gain ground from their Anglo-Saxon numbers. [footnote: An interesting illustration of this is the way in which English flower names which were in use till very recently are being ousted by Greek ones, Snapdragon becoming antirrhinum, forget-me-not becoming myosotis, etc. It is hard to see any practical reason for this change of fashion: it is probably due to an instinctive turning away from the more homely word and a vague feeling that the Greek word is scientific.]
. . . . . . . . .

The defense of the English language . . . has nothing to do with archaism, with the salvaging of obsolete words and turns of speech, or with the setting up of a "standard English" which must never be departed from. On the contrary, it is especially concerned with the scrapping of every word or idiom which has outworn its usefulness. It has nothing to do with correct grammar and syntax, which are of no importance so long as one makes one's meaning clear, or with the avoidance of Americanisms, or with having what is called a "good prose style." On the other hand, it is not concerned with fake simplicity and the attempt to make written English colloquial. Nor does it even imply in every case preferring the Saxon word to the Latin one, though it does imply using the fewest and shortest words that will cover one's meaning. What is above all needed is to let the meaning choose the word, and not the other way around.

Of course, if fewer and shorter words are called for, Saxonate terms will often win the day over their Latinate analogs. Nevertheless, the colonial mindset which deems these words "dirty" or "obscene" may at first hamper their use in the public sphere. To combat this prejudice, we may compare not the political history of these terms, but rather their etymological history. Such a Nietzschean "genealogy" may provide us with an alternate perspective from which to compare the relevant terms without the burden of spurious (i.e. politically-inculcated, or slave mentality) moralistic bias.

Consider, for example, shit, shit and feces, defecate: if we examine their etymologies, does one emerge from a more innocent, i.e. euphemistic perspective on the intended referent than the other? [etymologies courtesy Shipley, 1984]

Although the Indo-European root of feces is unclear (perhaps *bhƒy-), its more recent history is well known:

feces, however, is from L[atin] faex, faeces: sediment, dregs. The basic sense of defecate is to clear out the dregs, cleanse, purify. Thus, Robert Burton, in The Anatomy of Melancholy (1621), states that Luther "began upon a sudden to defecate, and as another sun to drive away, those foggy mists of superstition." And fallible man is comforted by H. Macmillan in The True Vine (1870): "By the death of the body, sin is defecated."

Shit, on the other hand, can be traced back to the Indo-European sek, to cut, separate, or divide:

shite, shit, dropped from the animal; earlier skate, skite, as in blatherskite. blather: to talk nonsense loquaciously, as with verbal diarrhea. skate: shitter, originally a Scotch term of contempt, is now softened in the colloquial "He's a good skate." The Scotch song Maggie Lauder, by F. Sempill, 1650, a favorite with the American Army in the Revolution, contains the line: "Jog on your gait, ye blatherskate." Variants are bletherumskite and blatherskite. An informative Paston family letter written in 1449, relates: "I cam abord the Admirall, and bade them stryke [pull down their flag] in the Kyngys name, and they bade me skyte in the Kyngys name." (Note that sk, as still in Scandinavian tongues, was long sounded sh in English.)

Perhaps by some standards, then, shit is the more euphemistic, as it initially referenced the act of separation, not the waste itself (as with feces). Nevertheless, the much longer history of shit in English indicates it's use in literal reference to bowel-movements dates back at least to 1449, while defecate enjoyed a more general [metaphorical?] sense of removing waste at least as late as 1870. Here, again, however, it seems impossible to separate out the role of Latinate-bias in such choices, and the prospects for any objective account of relative "obscenity" seem dim.

[As should be expected given the inherently subjective nature of the language - world relationship.]

Wednesday, December 5, 2007

probability and public policy V: propensity theories

[part one of this series here]

As discussed before, frequentism suffers from some conceptual problems as an objective theory of probability. One problem we have not discussed, however, is that posed by one time events. Consider, for example, betting odds on a sporting event. I estimate that the probability the Aggies will beat the Longhorns in their upcoming game is 1/3, and bet accordingly. Now, this game is a one time, irrepeatable event, can we make objective sense of such a probability assignment? (If the fact that the Aggies and the Longhorns have met many times in the past is throwing you, consider this: on each meeting, the teams have had different players, different coaches, and different records for the season; thus, these are not repeats of the same event as with the tossing of a fair coin.)

Another popular example of one time probabilistic events is radioactive decay: when speaking of the probability that a lump of uranium will emit an α-particle within some time period t, we cannot be referring to the frequency of the outcome of a process (what process? something internal to the uranium? ~ but uranium emits particles "spontaneously," surely if there is such a process it is unobservable).

One solution to these worries is to interpret probability in terms of propensity: to say the Aggies only have a 1/3 chance of beating the Longhorns is to speak of something about the Aggies (the makeup of the team, the strategies they use, the quality of the coaching, etc.) which objectively determines their chances of winning this one time event. In the case of the uranium, we can say it has the propensity to decay at a certain rate. In the case of a coin, we can say a coin is "fair" if it has a propensity to come up heads with probability 1/2. Here, it makes sense to speak of the coin (or, more specifically, the mechanism of the toss) as being "fair" or not (as exhibiting a certain structure) even before the coin has been tossed a single time ~ no notion of a hypothetical infinity of trials is needed.

However, there are conceptual problems with the propensity interpretation as well. In particular, propensities cannot themselves be probabilities. The symmetry in the probability calculus which allowed us to derive Bayes' Rule is not exhibited by propensities (as pointed out in Humphries, 1985):

The point can be illustrated by means of a simple scientific example. When light with a frequency greater than some threshold value falls on a metal plate, electrons are emitted by the photoelectric effect. Whether or not a particular electron is emitted is an indeterministic matter, and hence we can claim that there is a propensity p for an electron in the metal to be emitted, conditional upon the metal being exposed to light above the threshold frequency. Is there a corresponding propensity for the metal to be exposed to such light, conditional on an electron being emitted, and if so, what is its value? Probability theory provides an answer to this question if we identify conditional propensities with conditional probabilities. The answer is simple-calculate the inverse probability from the conditional probability. Yet it is just this answer which is incorrect for propensities and the reason is easy to see. The propensity for the metal to be exposed to radiation above the threshold frequency, conditional upon an electron being emitted, is equal to the unconditional propensity for the metal to be exposed to such radiation, because whether or not the conditioning factor occurs in this case cannot affect the propensity value for that latter event to occur. That is, with the obvious interpretation of the notation, Pr(R/¬E) = Pr(R/E) = Pr(R). However, any use of inverse probability theorems from standard probability theory will require that P(R/E) = P(E/R)P(R)/P(E) and if P(E/R) ≠ P(E), we shall have P(R/E) ≠ P(R). In this case, because of the influence of the radiation on the propensity for emission, the first inequality is true, but the lack of reverse influence makes the second inequality false for opensities.

To take another example, heavy cigarette smoking increases the propensity for lung cancer, whereas the presence of
(undiscovered) lung cancer has no effect on the propensity to smoke, and a similar probability calculation would give an incorrect result. Many other examples can obviously be given.

The point here is just that probabilities are symmetric with respect to causal order, this is the trick which allowed us to derive Bayes' Rule. Yet propensities cannot be made sense of in this way; they are asymmetric with respect to causal order.

Nevertheless, Suppes, 1987 and 2002 has argued that in particular cases one can derive a probability calculus from propensities. This is why we speak of propensity theories: there can be no one unified derivation of the probability calculus from a general theory of propensity.

Now that we've got some different perspectives on the table, let's see how they deal with a couple of examples.

next: "priors" and the fair coin revisited

Saturday, December 1, 2007

understanding the mind V: color vision

Why do we use words like white, black, red, yellow, etc. to describe human skin color when the related skin tones aren't "really" white, black, red, etc.?

There are four types of light sensitive cells in the retina: the rods and the S, M, and L cones (for, roughly, "short," "medium," and "long" wavelengths). These cells contain a molecule which changes shape when hit with a baseline level of photons, but each are sensitive to light waves of different wavelengths.
E. Bruce Goldstein, Sensation and Perception, 7th edition, 2006

The S, M, and L cones interact to produce color vision. We can test the degree of sensitivity to each type of cell at a variety of different wavelengths to determine it's sensitivity profile.

Brian A. Wandell, Foundations of Vision, 1985

These three types of color cell are "wired" into two opponent color circuits, the red-green and the blue-yellow circuits.

E. Bruce Goldstein, Sensation and Perception, 7th edition, 2006

This process separates the highly correlated M and L cone signals in order to provide a richer color space. A consequence of the wiring from three wavelength detectors to two opponent color circuits is a circular color space, familiar to many as the color wheel. When we graph this color circle against the dimension of brightness, we get a spindle shaped space corresponding to the subjective perception of color.

Paul M. Churchland, The Engine of Reason, The Seat of the Soul, 1996

Peter Gärdenfors has observed that if we consider the spindle shaped subspace of this color space which corresponds to possible human skin tones and attach our basic color words to the corresponding parts of this subspace, we can retrieve the use of these terms in describing human skin tone.

Peter Gärdenfors, Conceptual Spaces: The Geometry of Thought, 2004

Here we have an example of a linguistic structure, an analogy, which is suggested, perhaps even forced, by the physiological structure of human perception. . . . and how many more also are?

probability and public policy IV: frequentism

[part one of this series here]

The subjective view of probability leaves probability entirely in our heads, it is merely a reflection of our uncertainty about the relevant details. An objective view of probability places probability "out in the world" somehow. The first such theory we will examine is frequentism.

The frequentist believes probability attributions make a hypothetical claim about an infinite number of trials. Consider again our coin toss; for a frequentist, the claim that the probability of heads on a given toss is 1/2 means that if the coin were tossed an infinite number of times, then 1/2 of those tosses would come out heads. This approach considers probability as a hypothetical property of a physical system, but is extremely problematic when we examine the details.

Suppose the underlying physical mechanism of the toss is indeed fair, i.e. the probability "really" is 1/2: what guarantees that exactly half of the trials will come out heads? This relationship between the limit of the number of heads as the number of tosses goes to infinity and the underlying physical mechanism is theoretically guaranteed by the law of large numbers.

In order to further clarify this distinction between the frequency and the physical mechanism, consider another classic example, this time from Bernoulli. Suppose one fills an urn with 50 red balls and 50 green balls. Then, one uses some procedure (say shaking the urn then reaching in with one's eyes closed) in order to select from the urn at random. After each selection of a ball from the urn (a "trial") the color of the ball is noted and it is returned to the urn. Here, there is a physical fact about the ratio of red balls to total balls in the urn (50/100 = 1/2); there is also a physical process to randomly select balls. What the law of large numbers tells us is that if the process for selecting balls from the urn is indeed "random," then the ratio of red balls to total balls examined will approach 1/2 as the number of trials grows.

There are two related remarks to make here. First, the physical systems associated with most classic probability setups will deteriorate over the course of such an extended number of trials. For example, if one rolls a die several hundred thousand times, it begins to turn spherical as the corners chip away from friction. The point here is just that if the procedure of rolling the die were actually carried out an enormous number of times, the number of rolls returning five will only approach 1/6 for a while, until the die has deteriorated enough that the physical properties of the system change, at which point, it is no longer clear that an unambiguous answer of five will even be possible.

This brings us to our second remark. The assumption behind the law of large numbers and the urn, coin, and die examples is that the underlying physical processes will produce stable probabilities. Logically, however, nothing rules out the possibility of a coin which comes up heads 1/2 the time on the first 100 tosses, but 1/4 the time on the next 100 tosses. In fact, the die example shows that there are strong empirical reasons for suspecting that many systems do not in fact demonstrate this long term stability.

So, frequentism succeeds in putting probability "in the world," but at the cost of plausibility. Still, there are many situations for which frequentism is an especially useful view, and it is one of the prime competitors against subjective Bayes in the realm of statistics.

next: propensity theories