Maggie's FarmWe are a commune of inquiring, skeptical, politically centrist, capitalist, anglophile, traditionalist New England Yankee humans, humanoids, and animals with many interests beyond and above politics. Each of us has had a high-school education (or GED), but all had ADD so didn't pay attention very well, especially the dogs. Each one of us does "try my best to be just like I am," and none of us enjoys working for others, including for Maggie, from whom we receive neither a nickel nor a dime. Freedom from nags, cranks, government, do-gooders, control-freaks and idiots is all that we ask for. |
Our Recent Essays Behind the Front Page
Categories
QuicksearchLinks
Blog Administration |
Wednesday, April 10. 2013More fun with statistics: A simple math problem for our readersThis simple problem, offered by a reader, doesn't (I think) require Bayesian methods: Suppose some one person stole some money and there are a hundred possible suspects. You use a lie detector, which has a 99% chance of a positive if you are guilty, and a 99% chance of a negative if you are innocent. Someone tests positive. What are the chances the person is guilty? As with medical tests, this deals with rates of false negatives and false positives. Please explain your answer in the comments.
Posted by Dr. Joy Bliss
in The Culture, "Culture," Pop Culture and Recreation
at
15:56
| Comments (35)
| Trackbacks (0)
Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Zero. Because the perp is not included in the 100 suspects.
This problem is too easy and does not need a statistical solution. It was the IRS.
If the perp is in the sample, how about 50%. You've got a "yes/no" question, and to me that's 50/50, regardless of the test reliability.
It's about 50% (assuming that the guilty party is one of the 100) and I think it does require Bayes, even if implicitly.
Why? because the probability of finding the true guilty party (.01 .99) is very close to the probability of a false positive (.99 .01). That's right. In this case, a test that is 99% 'accurate' is only right half the time.
Knowing that 1% are guilty, and 99% are innocent, the expected number of positives is just about two, with half being a false positives. It doesn't matter if you have one or two positive results, the probability is ½ that a positive will be a false positive. The situation with one positive could lead to false imprisonment, as many people will be convinced by the 99% 'accuracy' of the test. "Knowing that 1% are guilty, and 99% are innocent, the expected number of positives is just about two, with half being a false positives."
That statement is correct. "It doesn't matter if you have one or two positive results, the probability is ½ that a positive will be a false positive." That statement is incorrect. We can treat this as a joint probability; a test of the liar and a test of the innocent. They have equal probability, yes, but there may be more than one false positive. If we know there is only one positive, then we have eliminated many of the combinations on the second part of the joint probability. See below for more. apparently, an asterisk is not permitted in the comments. Probabilities are:
.01 X .99 and .99 X .01 If, on the other hand, one and ONLY one of the one hundred possible suspects tests positive?
If exactly one tests positive, the probability that it is the guilty party is greater than 99%. That is because the probability of a false negative is 1%.
Are you kidding me. The problem already stated that the answer is 99%.
What is wrong with you people. Thank you. Don't think the question pertained to rocket science.
Posit that the perp is in the lineup and that 'someone' means exactly one (note the ambiguity in the problem statement). Expand (.99P_2 + .01N_2) (.01P_1 + .99N_1)^99
and collect terms linear in P_2 and no P_1 (one guilty perp, no innocents) or P_1 and no P_2 (one guilty innocent, no perp), we get the prior probabilities. one innocent, no guilty: 0.003697296376497265 one guilty, no innocent: 0.36603234127322914 Since it is given that one or the other occurs, normalize by dividing by the sum Gives 0.9900000000000001. The trailing 1 is roundoff. If you write out the numbers and cancel terms it is exactly .99. The result is a bit different if 'someone' means at least one. In real life, one would also need to estimate the probability that the perp is not in the tested population. Let's hope I didn't embarrass myself ;) Let's think of this as a disease with a 1% prevalence. The test for it is 99% sensitive and 99% specific. Lets bump the numbers up to a group of 10,000 to make this easier.
Since the prevalence is 1%, 100 people have the disease (they are the thieves.) Since the sensitivity of the test is 99%, when we test these people we will get 99 true positives and 1 false negative. Since the specificity of the test is 99% when we test those that don't have the disease (that is the innocent) 9801 will test negative and 99 will test positive. So the positive predictive value of the test is the true positives divided by the total positives which is 99/99+99 or 99/198=0.5. So the likelihood that the person who tests positive is the thief is 50%. The negative predictive value is much better. The true negatives divided by the total negatives is 9801/9801+1 or 99+%. So if you test negative you probably are not the theif Yes, happily the Storkdoc got it right. 50%.
Always take a Valium or two prior to a lie detector test if you are prone to anxiety. Bumping the numbers up is cheating. If you have, say, a million in the lineup, then a single positive is almost certainly a false positive. If you bump the numbers down, say to 1, and given that the perp is among them, then the probability that you have the perp is 1. The result in the given problem depends on a nice choice of probabilities and numbers in the lineup.
Of course, if you only find one positive in a million, one might also suspect an error in the given probabilites, so that would be another hypothesis: the probabilities are screwy. Chuck, bumping the lineup to a million people will mean that there are now 10,000 thieves as the problem states that 1 in 100 is a thief. I bumped the numbers up to avoid fractions, but the answer always comes out to .5 for the positive predictive value.
As I mentioned, the problem statement is ambiguous in the interpretation of 'someone'. If you assume, as you did, that that means you pick a single person at random and then test them without regard to the test results of any of the others, then you are still wrong in the big number case as the prior for guilty is 1/n and the prior for not quilty is (n - 1)/n. If n happens to be 100 the last is conveniently .99, but that doesn't hold for other values of n, in particular, as n->inf the first goes to zero and the second to one.
However, it is a lineup, and in that case it is more natural to assume that the 'someone' means one of them tested positive and the others negative. I don't think it is usual to only test one when a number of suspected perps are involved.
#9.1.1.1.1
chuck
on
2013-04-10 21:21
(Reply)
You have it right, but the answer is wrong because you forgot to apply standard deviation rules which would bring the figure to 46.19764% across the entire population tested.
Don't you people understand the Noo Math? 2+2=5!
http://www.youtube.com/watch?v=3eTjftyAtIc Yep, the storkdoc did get it right. Here's the Bayesian version
(where G=guilty, I=innocent, and + means a positive result): P(G|+) = P(+|G)P(G)/{P(+|G)P(G) + P(+|I)P(I)} = (0.99)(0.01)/{(0.99)(0.01) + (0.01)(0.99)} = 0.50 Wrong priors. Remember that in order for a single positive to occur, all the others in the lineup must test negative. In other words, you aren't making use of all the information. What you have analysed is: pick a person at random, then test them. The problem posits that all of them are tested and that there is only one positive.
Nope. It says "Someone tests positive."
It doesn't say anything about anyone else. Just like getting a positive on a mammogram. Doesn't matter if anyone else got a positive. Here's how I dissect it mathematically. Let us number the suspects from 1 through 100, with #100 being the actual thief.
There are 10^200 different possible ordered sets of test results, but we are going to throw out all the ones where the total number of positives is not exactly 1. I will represent each test result as a sequence of 100 numbers from 1 to 100, like this: {x1,x2,...,x100}. Think of each x as a roll of percentile dice. For each x, the values 1 through 99 represent correct results (negative for x1 through x99, positive for x100). The value 100 represents a wrong result. Then the sets of results we keep are: #1 gets the only positive: { 100, 1..99, 1..99, ..., 100 } = 99^98 = 3.79464 x 10^195 cases. #2 gets the only positive: { 1..99, 100, 1..99, ..., 100 } = 99^98 = 3.79464 x 10^195 cases. and so on until... #100 gets the only positive: {1..99, 1..99, 1..99, ..., 1..99 } = 99^100 cases. (For the non-programmers among us: a^b should be read as "a to the b power." This is the notation used in BASIC.) Since the number of cases for each of #1 through #99 are the same, we can lump them together. So we have 99^99 cases where there's exactly one positive and it's an innocent guy, and 99^100 cases (99 times as many) where there's exactly one positive and it's the right guy. Therefore the guy who got the positive lie-detector test is guilty 99% of the time. Old Army saying: "There's only one thief in the Army. Everybody else is just trying to get their shit back.".
#7jpm is correct. Except there are too many unknowns and variables in the question=not "a simple math problem". Just one example: You do not know how to run the "lie detector". I'd venture an inexperienced novice would mess up the lie detector test more than 1% vs the 99% accuracy rating.
I would just like to point out that statistics, or even rational numbering systems like 1 out of 99, can be interpreted any number of ways by intelligent people because of over reliance on logical systems that do not apply to the problem in either a practical or theoretical sense.
Which only goes to prove my thesis that all statistical analysis isn't worth spit. Facts are stubborn things, but statistics are more pliable. — Mark Twain
Ran some interesting tests.
If there is exactly one liar, or 1% liars on average, then someone who tests positive has a 50% chance of being a false positive. If there is exactly one liar, or 1% liars on average, and you test everyone, and exactly one person tests positive, then there is a 99% chance you caught the person. However, the odds are better than 60% you will have more than one positive. Many might assume they were working together. Now, here is an interesting example, which seems in keeping with the original scenario. If there is exactly one liar, then test only until you have the first positive, that is, till you think you have caught the culprit, then the chance of a false positive is about 37%. Here's a bit of the code; p(i) is people, r(i) is results:
QUOTE: ' Lie detector test For i = 1 To 100 If p(i) = 0 Then If Rnd < 0.01 Then r(i) = 1 Else r(i) = 0 Else If Rnd < 0.99 Then r(i) = 1 Else r(i) = 0 End If ' If r(i) = 1 Then Exit For ' tests for quitting at first postive Next i Yep, in information that exactly one of all tests positive adds a lot because if it is false then the liar tested innocent, which is unlikely. In that case, as you point out, there are most likely two positives, but that possibility is eliminated.
I think the moral of the story is that the conditions and assumptions of a statistical test need to be clearly stated and reviewed for relevance as it is easy to get things wrong. OK I'll give a totally different bent..
The chance you got the right guy is 1%. = 0.01. 0.01 99% true positive ==0.0099 = 0.99% The chance you got the wrong guy is 99% = 0.99 0.99 1% false positive = 0.0099 = 0.99% So since these are = you have a 50% coin toss you got the right guy. So I agree with other authors, just in English. lie detector result
stole didn't steal test the innocent 99 0.99 98.01 test the guilty 1 0.99 0.01 ------- Total guilties out of 100 1,98 So the lie detector will find 1.98 guilties out of 100. So if you are found guilty, the chance is 1/1.98 = 50.505050...% that you really are. lie detector result stole didn't steal test the innocent 99 0.99 98.01 test the guilty 1 0.99 0.01 ------- Total guilties out of 100 1.98 The lie detector will find 1.98 guilties out of 100. So if you are found guilty, the chance is 1/1.98 = 50.505050...% that you really are. |