28 Sept 2000 
Sexing your Meep  
A little foray into Bayesian inference 
So, earlier this week, noting the change to fall-like weather, I donned my 
usual fall uniform: leather hat, leather jacket, leather vest, pirate 
shirt, blue jeans, leather shoes.  With hair tucked up under the hat, I 
got my usual "Was that a woman or a man?" looks.   
So, as a public service, I thought I'd talk about how to determine the 
sex of a random person.   
So you've got a Meep walking down the road -- how can you determine Meep's 
gender?  Flip Meep over and check between the legs.  Ooops, that's how you 
sex chickens.  So if you're not going to be able to persuade me to remove 
my clothing, give you a blood or tissue sample (I'm =pretty= sure I'm XX), 
you're going to have to go with observable cues.  That's where inference 
comes in. 
So, while I'm trying to look up demographic journals to give me actual 
statistics, let me explain conditional probability to you. 
We've all run into conditional probability, mainly because we don't live 
in a fog of ignorance all the time ('we don't?').  How often have we asked 
"What's the chance of my plane leaving on time given that there are major 
thunderstorms over Chicago?", "What's the chance of me going to the gym 
tomorrow morning, given that it's 3 am right now and I'm trying to finish 
Harry Potter?", "The other guy has three of a kind showing, but I'm so 
sure that my flush will beat his hand!  However, what's the probability he 
will beat me, also given that he keeps raising my bets?" (for more info on 
that last question, see me.  I'm thinking of writing a book: Meep's 
Complete Poker Probability Bible.) 
So there's this thing called "conditional probability": P(A | B) = "the 
probability of A given B".  For example, what's P(in a family of 2 
children, both are boys | at least one is a boy)?  And what's P(both boys 
| oldest one is a boy)?  Are those probabilities the same?  (Answer at 
bottom of file) 
How does one calculate conditional probabilities?  There are two ways: do 
it directly.  This is easy if the probability of each outcome is 
equal.  So, if I tell you that someone has a hand of all red cards, what's 
the probability of them holding a heart flush?  I can just count up the 
number of heart flushes and divide by the number of hands made of hearts 
and diamonds.  Every hand is equally likely.  No sweat. 
But what if you can't calculate it directly?  Here's a nice little 
P(A | B) = P(A & B) / P(B) 
Remember that.  Tattoo it somewhere you can read it (remember - if you 
tattoo it on your belly, put it upside-down.  if on your ass, tattoo it 
So to get the conditional probability of A given B, calculate the 
probability of both A & B happening, and divide by the probability of B 
This formula can be seen in another form as well (just minor algebra 
P(A & B) = P(A | B) P(B) 
So let's see what we can do with this info. 
It seems that the appropriate statistics are unavailable online, so I'm 
just going to pull them out of my ass.  Which is appropriate, for my first 
inference involves the ass. 
Now, there Meep goes, just a walkin' down the street... 
(singin doo wah ditty ditty dum ditty doo...) 
What's the first thing you notice?  Today, that is, when I'm wearing a 
short vest and shirt tucked in.  Yes!  Meep's got an ass! 
Now, one thing I've noticed throughout my life is that, if you're a woman, 
chances are good that you have an ass.  And if you're a man, chances are 
less that you have an ass. 
so let M = person is a man 
       W = person is a woman 
       A = person has an ass 
fake stats: 
P(A | W) = 70% 
P(A | M) = 50% 
we want to know: P(W | A) - probability a person is a woman, given that 
they've got an ass. 
Now let's see what we need: 
P(W | A) = P(W & A) / P(A) = P(A | W) P(W) / P(A) 
We already have P(A | W).  What's P(W)?  This is where =priors= come 
in.  You're trying to determine if someone's a man or a woman.  You have 
some prior probability in mind that they could be a particular 
gender.  Let's say this is at rush hour in Manhattan, Meep walking down 
the street.  Chances are about 50/50 that a person is one gender or 
another.  Now if you had been talking about walking around in the middle 
of the day in Afghanistan, the priors would've been way different.  As in, 
the prior probability of being a woman would be 0, as any woman wandering 
around would be immediately executed by the Taliban. 
But back to Meep's back property.  So let's assume P(W) = P(M) = 50%. 
What about P(A)?  Here we go with the old divide and conquer strategy.   
If I keep an old-fashioned view of the world, the event S that a person is 
a human (as in homo Sapiens)= W union M.  Also old-fashioned, I assume M & 
W are disjoint (no overlap).  So keep your she-males to yourself; at this 
point, things are complicated enough, so take transgender issues 
elsewhere.  Now this is cute.  Watch the probability fly: 
P(A) = P(A & S) = P(A & (W union M)) = P(A & W) + P(A & M) 
neat!  I split the event "having an ass" into two sub-events: being a 
woman with an ass, and being a man with an ass. 
So now I've got: P(A) = P(A & W) + P(A & M) 
		      = P(A | W)P(W) + P(A | M)P(M) 
Cool!  Now I can actually calculate stuff! 
P(A & W) = 70% * 50% = 35% 
P(A & M) = 50% * 50% = 25% 
P(W | A) = 35% / (35% + 25%) = 58% 
So far we can guess that Meep is female with 58% probability! 
Next, we notice Meep has prominent hips.  Indeed, the jeans fit quite 
nicely over this hip/ass package.  Can we use this information in any way? 
Well, again, most women have prominent hips.  But even fewer men have 
prominent hips. 
W & M mean the same thing, but now H = has hips. 
P(H | W) = 70% 
P(H | M) = 40% 
Cool!  Let's just chug through the info as before: 
P(H & W) = P(H | W) P(W) = 70% * 50% = 35% 
P(H & M) = P(H | M) P(M) = 40% * 50% = 20% 
P(W | H) = 35% / (35% + 20%) = 64% 
Wow!  We've got a better lock on!  Meep now stands a 64% chance of being 
female!  But, truthfully, we'd like to combine our two pieces of 
information.  Indeed, what is P(W | H & A)?   
Let's see what info I'd need: 
P(W | H & A) = P(W & H & A) / P(H & A) 
Actually, we can go several ways from here.  But what we really need to 
correlate hips & ass (these are =not= independent events, people -- women 
with asses tend to have hips, and men with hips tend to have 
asses... let's try to use this info to calculate): 
P(H | W & A) = 90% 
P(A | M & H) = 70% 
let's see where this gets us: 
P(W & H & A) = P(H | W & A) P(W & A) = 90% * 35% = 31.5% 
P(M & H & A) = P(A | M & H) P(M & H) = 70% * 20& = 14% 
P(W | H & A) = 31.5% / (31.5% + 14%) = 69% 
Now Meep has an 69% chance of being female.  Truthfully, I don't think 
we're going to get much better than this.  You might think that combining 
the results would actually help the situation better than that, but the 
truth is that since hips and ass usually go together for both men and 
women, combining the info doesn't take you much farther.  However, would 
you like to see what happens when hips and ass don't correlate very well 
amongst women, and correlate extremely well amongst men? 
P(H | W & A) = 40% 
P(A | M & H) = 100% 
P(W & H & A) = P(H | W & A) P(W & A) = 40% * 35% = 14% 
P(M & H & A) = P(A | M & H) P(M & H) = 100% * 20% = 20% 
P(W | H & A) = 14% / (14% + 20%) = 41% 
I could be going from info that was convincing me someone was female, to 
info convincing me someone was male!  That's why you've got to be careful 
of correlations:  if I told you most people who had asses were female, and 
most people who had hips were female, but most people who had asses =and= 
hips were male, you'd think I was crazy.  However, this is something that 
happens in real life all the time, due to all sorts of correlations.  This 
is something called Simpson's paradox. 
I'll give you an example from an old Stats text.  In the era of burgeoning 
women's rights, someone at Berkeley thought they'd look into the graduate 
programs at Berkeley and their admissions rate for women vs. men.  Ah-ha!   
A larger percentage of men were accepted over women!  Sexual 
However, though this was hot stuff, it wasn't enough to flesh out a 
research paper, so they decided to see if they could see which departments 
were the main culprits.  The departments, after all, were the level where 
the actual acceptance/rejection thing was going on. 
In =all= departments, women had a higher acceptance rate than men. 
What was going on?  More women were applying to programs that were more 
competitive.  So, for example, the education department had a lot of 
applicants, mostly women, and had a low acceptance rate.  On the other 
hand, physics had an applicant pool that was mostly men, but they had 
fewer applicants as a whole, and had a higher acceptance rate.  Women 
=overall= had a higher rejection rate because women flocked to the subject 
areas where the rejection rate was higher.  Likewise, men "played it 
safe" and mainly went for subject areas with less competition. 
Interesting, ne? 
By the way, the probability of a family having two boys given that it has 
at least one boy is 1/3.  The probability of a family having two boys 
given that the elder child is a boy is 1/2. 
And my gender?  Well, I once convinced someone online that I was a man 
named Mary.  That should be good enough for you. 
Prev Year Next