28 Sept 2000 Sexing your Meep or A little foray into Bayesian inference So, earlier this week, noting the change to fall-like weather, I donned my usual fall uniform: leather hat, leather jacket, leather vest, pirate shirt, blue jeans, leather shoes. With hair tucked up under the hat, I got my usual "Was that a woman or a man?" looks. So, as a public service, I thought I'd talk about how to determine the sex of a random person. So you've got a Meep walking down the road -- how can you determine Meep's gender? Flip Meep over and check between the legs. Ooops, that's how you sex chickens. So if you're not going to be able to persuade me to remove my clothing, give you a blood or tissue sample (I'm =pretty= sure I'm XX), you're going to have to go with observable cues. That's where inference comes in. So, while I'm trying to look up demographic journals to give me actual statistics, let me explain conditional probability to you. We've all run into conditional probability, mainly because we don't live in a fog of ignorance all the time ('we don't?'). How often have we asked "What's the chance of my plane leaving on time given that there are major thunderstorms over Chicago?", "What's the chance of me going to the gym tomorrow morning, given that it's 3 am right now and I'm trying to finish Harry Potter?", "The other guy has three of a kind showing, but I'm so sure that my flush will beat his hand! However, what's the probability he will beat me, also given that he keeps raising my bets?" (for more info on that last question, see me. I'm thinking of writing a book: Meep's Complete Poker Probability Bible.) So there's this thing called "conditional probability": P(A | B) = "the probability of A given B". For example, what's P(in a family of 2 children, both are boys | at least one is a boy)? And what's P(both boys | oldest one is a boy)? Are those probabilities the same? (Answer at bottom of file) How does one calculate conditional probabilities? There are two ways: do it directly. This is easy if the probability of each outcome is equal. So, if I tell you that someone has a hand of all red cards, what's the probability of them holding a heart flush? I can just count up the number of heart flushes and divide by the number of hands made of hearts and diamonds. Every hand is equally likely. No sweat. But what if you can't calculate it directly? Here's a nice little formula: P(A | B) = P(A & B) / P(B) Remember that. Tattoo it somewhere you can read it (remember - if you tattoo it on your belly, put it upside-down. if on your ass, tattoo it backwards). So to get the conditional probability of A given B, calculate the probability of both A & B happening, and divide by the probability of B occurring. This formula can be seen in another form as well (just minor algebra manipulation): P(A & B) = P(A | B) P(B) So let's see what we can do with this info. It seems that the appropriate statistics are unavailable online, so I'm just going to pull them out of my ass. Which is appropriate, for my first inference involves the ass. Now, there Meep goes, just a walkin' down the street... (singin doo wah ditty ditty dum ditty doo...) What's the first thing you notice? Today, that is, when I'm wearing a short vest and shirt tucked in. Yes! Meep's got an ass! Now, one thing I've noticed throughout my life is that, if you're a woman, chances are good that you have an ass. And if you're a man, chances are less that you have an ass. so let M = person is a man W = person is a woman A = person has an ass fake stats: P(A | W) = 70% P(A | M) = 50% we want to know: P(W | A) - probability a person is a woman, given that they've got an ass. Now let's see what we need: P(W | A) = P(W & A) / P(A) = P(A | W) P(W) / P(A) We already have P(A | W). What's P(W)? This is where =priors= come in. You're trying to determine if someone's a man or a woman. You have some prior probability in mind that they could be a particular gender. Let's say this is at rush hour in Manhattan, Meep walking down the street. Chances are about 50/50 that a person is one gender or another. Now if you had been talking about walking around in the middle of the day in Afghanistan, the priors would've been way different. As in, the prior probability of being a woman would be 0, as any woman wandering around would be immediately executed by the Taliban. But back to Meep's back property. So let's assume P(W) = P(M) = 50%. What about P(A)? Here we go with the old divide and conquer strategy. If I keep an old-fashioned view of the world, the event S that a person is a human (as in homo Sapiens)= W union M. Also old-fashioned, I assume M & W are disjoint (no overlap). So keep your she-males to yourself; at this point, things are complicated enough, so take transgender issues elsewhere. Now this is cute. Watch the probability fly: P(A) = P(A & S) = P(A & (W union M)) = P(A & W) + P(A & M) neat! I split the event "having an ass" into two sub-events: being a woman with an ass, and being a man with an ass. So now I've got: P(A) = P(A & W) + P(A & M) = P(A | W)P(W) + P(A | M)P(M) Cool! Now I can actually calculate stuff! P(A & W) = 70% * 50% = 35% P(A & M) = 50% * 50% = 25% P(W | A) = 35% / (35% + 25%) = 58% So far we can guess that Meep is female with 58% probability! Next, we notice Meep has prominent hips. Indeed, the jeans fit quite nicely over this hip/ass package. Can we use this information in any way? Well, again, most women have prominent hips. But even fewer men have prominent hips. W & M mean the same thing, but now H = has hips. P(H | W) = 70% P(H | M) = 40% Cool! Let's just chug through the info as before: P(H & W) = P(H | W) P(W) = 70% * 50% = 35% P(H & M) = P(H | M) P(M) = 40% * 50% = 20% P(W | H) = 35% / (35% + 20%) = 64% Wow! We've got a better lock on! Meep now stands a 64% chance of being female! But, truthfully, we'd like to combine our two pieces of information. Indeed, what is P(W | H & A)? Let's see what info I'd need: P(W | H & A) = P(W & H & A) / P(H & A) Actually, we can go several ways from here. But what we really need to correlate hips & ass (these are =not= independent events, people -- women with asses tend to have hips, and men with hips tend to have asses... let's try to use this info to calculate): P(H | W & A) = 90% P(A | M & H) = 70% let's see where this gets us: P(W & H & A) = P(H | W & A) P(W & A) = 90% * 35% = 31.5% P(M & H & A) = P(A | M & H) P(M & H) = 70% * 20& = 14% so P(W | H & A) = 31.5% / (31.5% + 14%) = 69% Now Meep has an 69% chance of being female. Truthfully, I don't think we're going to get much better than this. You might think that combining the results would actually help the situation better than that, but the truth is that since hips and ass usually go together for both men and women, combining the info doesn't take you much farther. However, would you like to see what happens when hips and ass don't correlate very well amongst women, and correlate extremely well amongst men? P(H | W & A) = 40% P(A | M & H) = 100% P(W & H & A) = P(H | W & A) P(W & A) = 40% * 35% = 14% P(M & H & A) = P(A | M & H) P(M & H) = 100% * 20% = 20% So: P(W | H & A) = 14% / (14% + 20%) = 41% I could be going from info that was convincing me someone was female, to info convincing me someone was male! That's why you've got to be careful of correlations: if I told you most people who had asses were female, and most people who had hips were female, but most people who had asses =and= hips were male, you'd think I was crazy. However, this is something that happens in real life all the time, due to all sorts of correlations. This is something called Simpson's paradox. I'll give you an example from an old Stats text. In the era of burgeoning women's rights, someone at Berkeley thought they'd look into the graduate programs at Berkeley and their admissions rate for women vs. men. Ah-ha! A larger percentage of men were accepted over women! Sexual discrimination! However, though this was hot stuff, it wasn't enough to flesh out a research paper, so they decided to see if they could see which departments were the main culprits. The departments, after all, were the level where the actual acceptance/rejection thing was going on. In =all= departments, women had a higher acceptance rate than men. What was going on? More women were applying to programs that were more competitive. So, for example, the education department had a lot of applicants, mostly women, and had a low acceptance rate. On the other hand, physics had an applicant pool that was mostly men, but they had fewer applicants as a whole, and had a higher acceptance rate. Women =overall= had a higher rejection rate because women flocked to the subject areas where the rejection rate was higher. Likewise, men "played it safe" and mainly went for subject areas with less competition. Interesting, ne? By the way, the probability of a family having two boys given that it has at least one boy is 1/3. The probability of a family having two boys given that the elder child is a boy is 1/2. And my gender? Well, I once convinced someone online that I was a man named Mary. That should be good enough for you.

Prev | Year | Next |