More fun with statistics: A simple math problem for our readers

Quicksearch

Blog Administration

Open login screen

RSS Readers

Wednesday, April 10. 2013

More fun with statistics: A simple math problem for our readers

This simple problem, offered by a reader, doesn't (I think) require Bayesian methods:

Suppose some one person stole some money and there are a hundred possible suspects. You use a lie detector, which has a 99% chance of a positive if you are guilty, and a 99% chance of a negative if you are innocent. Someone tests positive. What are the chances the person is guilty?

As with medical tests, this deals with rates of false negatives and false positives. Please explain your answer in the comments.

Posted by Dr. Joy Bliss in The Culture, "Culture," Pop Culture and Recreation at 15:56 | Comments (35) | Trackbacks (0)

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as (Linear | Threaded)

Zero. Because the perp is not included in the 100 suspects.

#1 John A. Fleming on 2013-04-10 16:27 (Reply)

This problem is too easy and does not need a statistical solution. It was the IRS.

#1.1 GoneWithTheWind on 2013-04-11 09:44 (Reply)

If the perp is in the sample, how about 50%. You've got a "yes/no" question, and to me that's 50/50, regardless of the test reliability.

#2 Cuetip on 2013-04-10 16:35 (Reply)

It's about 50% (assuming that the guilty party is one of the 100) and I think it does require Bayes, even if implicitly.

Why? because the probability of finding the true guilty party (.01 .99) is very close to the probability of a false positive (.99 .01).

#3 Desconhecido on 2013-04-10 16:41 (Reply)

That's right. In this case, a test that is 99% 'accurate' is only right half the time.

Knowing that 1% are guilty, and 99% are innocent, the expected number of positives is just about two, with half being a false positives. It doesn't matter if you have one or two positive results, the probability is ½ that a positive will be a false positive.

The situation with one positive could lead to false imprisonment, as many people will be convinced by the 99% 'accuracy' of the test.

#3.1 Zachriel on 2013-04-10 19:13 (Reply)

"Knowing that 1% are guilty, and 99% are innocent, the expected number of positives is just about two, with half being a false positives."

That statement is correct.

"It doesn't matter if you have one or two positive results, the probability is ½ that a positive will be a false positive."

That statement is incorrect. We can treat this as a joint probability; a test of the liar and a test of the innocent. They have equal probability, yes, but there may be more than one false positive. If we know there is only one positive, then we have eliminated many of the combinations on the second part of the joint probability.

See below for more.

#3.1.1 Zachriel on 2013-04-11 12:14 (Reply)

apparently, an asterisk is not permitted in the comments. Probabilities are:
.01 X .99
and
.99 X .01

#4 desconhecido on 2013-04-10 16:43 (Reply)

If, on the other hand, one and ONLY one of the one hundred possible suspects tests positive?

#5 Formerly known as Skeptic on 2013-04-10 16:55 (Reply)

If exactly one tests positive, the probability that it is the guilty party is greater than 99%. That is because the probability of a false negative is 1%.

#6 Desconhecido on 2013-04-10 17:45 (Reply)

Are you kidding me. The problem already stated that the answer is 99%.
What is wrong with you people.

#7 jpm on 2013-04-10 19:14 (Reply)

Thank you. Don't think the question pertained to rocket science.

#7.1 Donna Devine on 2013-04-10 19:46 (Reply)

Posit that the perp is in the lineup and that 'someone' means exactly one (note the ambiguity in the problem statement). Expand (.99P_2 + .01N_2) (.01P_1 + .99N_1)^99

and collect terms linear in P_2 and no P_1 (one guilty perp, no innocents) or P_1 and no P_2 (one guilty innocent, no perp), we get the prior probabilities.

one innocent, no guilty:
0.003697296376497265

one guilty, no innocent:
0.36603234127322914

Since it is given that one or the other occurs, normalize by dividing by the sum

Gives 0.9900000000000001. The trailing 1 is roundoff. If you write out the numbers and cancel terms it is exactly .99.

The result is a bit different if 'someone' means at least one. In real life, one would also need to estimate the probability that the perp is not in the tested population.

Let's hope I didn't embarrass myself ;)

#8 chuck on 2013-04-10 19:38 (Reply)

Let's think of this as a disease with a 1% prevalence. The test for it is 99% sensitive and 99% specific. Lets bump the numbers up to a group of 10,000 to make this easier.

Since the prevalence is 1%, 100 people have the disease (they are the thieves.) Since the sensitivity of the test is 99%, when we test these people we will get 99 true positives and 1 false negative.

Since the specificity of the test is 99% when we test those that don't have the disease (that is the innocent) 9801 will test negative and 99 will test positive.

So the positive predictive value of the test is the true positives divided by the total positives which is 99/99+99 or 99/198=0.5.

So the likelihood that the person who tests positive is the thief is 50%.

The negative predictive value is much better. The true negatives divided by the total negatives is 9801/9801+1 or 99+%. So if you test negative you probably are not the theif

#9 storkdoc on 2013-04-10 19:49 (Reply)

Yes, happily the Storkdoc got it right. 50%.

Always take a Valium or two prior to a lie detector test if you are prone to anxiety.

#9.1 Dr.B on 2013-04-10 19:54 (Reply)

Bumping the numbers up is cheating. If you have, say, a million in the lineup, then a single positive is almost certainly a false positive. If you bump the numbers down, say to 1, and given that the perp is among them, then the probability that you have the perp is 1. The result in the given problem depends on a nice choice of probabilities and numbers in the lineup.

Of course, if you only find one positive in a million, one might also suspect an error in the given probabilites, so that would be another hypothesis: the probabilities are screwy.

#9.1.1 chuck on 2013-04-10 20:05 (Reply)

Chuck, bumping the lineup to a million people will mean that there are now 10,000 thieves as the problem states that 1 in 100 is a thief. I bumped the numbers up to avoid fractions, but the answer always comes out to .5 for the positive predictive value.

#9.1.1.1 storkdoc on 2013-04-10 21:00 (Reply)

As I mentioned, the problem statement is ambiguous in the interpretation of 'someone'. If you assume, as you did, that that means you pick a single person at random and then test them without regard to the test results of any of the others, then you are still wrong in the big number case as the prior for guilty is 1/n and the prior for not quilty is (n - 1)/n. If n happens to be 100 the last is conveniently .99, but that doesn't hold for other values of n, in particular, as n->inf the first goes to zero and the second to one.

However, it is a lineup, and in that case it is more natural to assume that the 'someone' means one of them tested positive and the others negative. I don't think it is usual to only test one when a number of suspected perps are involved.

#9.1.1.1.1 chuck on 2013-04-10 21:21 (Reply)

You have it right, but the answer is wrong because you forgot to apply standard deviation rules which would bring the figure to 46.19764% across the entire population tested.

#9.2 Tom Francis on 2013-04-10 21:20 (Reply)

Don't you people understand the Noo Math? 2+2=5!

http://www.youtube.com/watch?v=3eTjftyAtIc

#10 Jewel (Link) on 2013-04-10 20:36 (Reply)

Yep, the storkdoc did get it right. Here's the Bayesian version
(where G=guilty, I=innocent, and + means a positive result):

P(G|+) = P(+|G)P(G)/{P(+|G)P(G) + P(+|I)P(I)}

= (0.99)(0.01)/{(0.99)(0.01) + (0.01)(0.99)}

= 0.50

#11 dr.bill on 2013-04-10 20:41 (Reply)

Wrong priors. Remember that in order for a single positive to occur, all the others in the lineup must test negative. In other words, you aren't making use of all the information. What you have analysed is: pick a person at random, then test them. The problem posits that all of them are tested and that there is only one positive.

#11.1 chuck on 2013-04-10 20:59 (Reply)

Nope. It says "Someone tests positive."
It doesn't say anything about anyone else.
Just like getting a positive on a mammogram.
Doesn't matter if anyone else got a positive.

#11.1.1 dr.bill on 2013-04-11 01:28 (Reply)

Here's how I dissect it mathematically. Let us number the suspects from 1 through 100, with #100 being the actual thief.

There are 10^200 different possible ordered sets of test results, but we are going to throw out all the ones where the total number of positives is not exactly 1. I will represent each test result as a sequence of 100 numbers from 1 to 100, like this: {x1,x2,...,x100}. Think of each x as a roll of percentile dice. For each x, the values 1 through 99 represent correct results (negative for x1 through x99, positive for x100). The value 100 represents a wrong result.

Then the sets of results we keep are:

#1 gets the only positive: { 100, 1..99, 1..99, ..., 100 } = 99^98 = 3.79464 x 10^195 cases.
#2 gets the only positive: { 1..99, 100, 1..99, ..., 100 } = 99^98 = 3.79464 x 10^195 cases.
and so on until...
#100 gets the only positive: {1..99, 1..99, 1..99, ..., 1..99 } = 99^100 cases.

(For the non-programmers among us: a^b should be read as "a to the b power." This is the notation used in BASIC.)

Since the number of cases for each of #1 through #99 are the same, we can lump them together. So we have 99^99 cases where there's exactly one positive and it's an innocent guy, and 99^100 cases (99 times as many) where there's exactly one positive and it's the right guy.

Therefore the guy who got the positive lie-detector test is guilty 99% of the time.

#12 jdgalt on 2013-04-10 21:58 (Reply)

1 in 100...self explanitory.

#13 AHH!!! on 2013-04-10 22:03 (Reply)

Old Army saying: "There's only one thief in the Army. Everybody else is just trying to get their shit back.".

#14 twolaneflash on 2013-04-10 23:22 (Reply)

#7jpm is correct. Except there are too many unknowns and variables in the question=not "a simple math problem". Just one example: You do not know how to run the "lie detector". I'd venture an inexperienced novice would mess up the lie detector test more than 1% vs the 99% accuracy rating.

#15 mouseketeer on 2013-04-11 04:07 (Reply)

I would just like to point out that statistics, or even rational numbering systems like 1 out of 99, can be interpreted any number of ways by intelligent people because of over reliance on logical systems that do not apply to the problem in either a practical or theoretical sense.

Which only goes to prove my thesis that all statistical analysis isn't worth spit.

#16 Tom Francis on 2013-04-11 07:38 (Reply)

Facts are stubborn things, but statistics are more pliable. — Mark Twain

#17 Zachriel on 2013-04-11 07:41 (Reply)

Ran some interesting tests.

If there is exactly one liar, or 1% liars on average, then someone who tests positive has a 50% chance of being a false positive.

If there is exactly one liar, or 1% liars on average, and you test everyone, and exactly one person tests positive, then there is a 99% chance you caught the person. However, the odds are better than 60% you will have more than one positive. Many might assume they were working together.

Now, here is an interesting example, which seems in keeping with the original scenario. If there is exactly one liar, then test only until you have the first positive, that is, till you think you have caught the culprit, then the chance of a false positive is about 37%.

#18 Zachriel on 2013-04-11 12:15 (Reply)

Here's a bit of the code; p(i) is people, r(i) is results:

QUOTE:

' Lie detector test

‪‪‪‪‪‪‬‪‪‪‬‬‬‪‪‪‬‬‬ For i = 1 To 100

If p(i) = 0 Then
If Rnd < 0.01 Then r(i) = 1 Else r(i) = 0
Else
If Rnd < 0.99 Then r(i) = 1 Else r(i) = 0
End If

' If r(i) = 1 Then Exit For ' tests for quitting at first postive

Next i

#18.1 Zachriel on 2013-04-11 13:54 (Reply)

Yep, in information that exactly one of all tests positive adds a lot because if it is false then the liar tested innocent, which is unlikely. In that case, as you point out, there are most likely two positives, but that possibility is eliminated.

I think the moral of the story is that the conditions and assumptions of a statistical test need to be clearly stated and reviewed for relevance as it is easy to get things wrong.

#18.2 chuck on 2013-04-11 15:50 (Reply)

OK I'll give a totally different bent..

The chance you got the right guy is 1%. = 0.01.
0.01 99% true positive ==0.0099 = 0.99%

The chance you got the wrong guy is 99% = 0.99
0.99 1% false positive = 0.0099 = 0.99%

So since these are = you have a 50% coin toss you got the right guy.

So I agree with other authors, just in English.

#19 thehawkreturns on 2013-04-12 02:03 (Reply)

lie detector result
stole didn't steal
test the innocent 99 0.99 98.01
test the guilty 1 0.99 0.01
-------
Total guilties out of 100 1,98

So the lie detector will find 1.98 guilties out of 100. So if you are found guilty, the chance is 1/1.98 = 50.505050...% that you really are.

#20 Hap on 2013-04-12 10:45 (Reply)

lie detector result
stole didn't steal
test the innocent 99 0.99 98.01
test the guilty 1 0.99 0.01
-------
Total guilties out of 100 1.98

The lie detector will find 1.98 guilties out of 100. So if you are found guilty, the chance is 1/1.98 = 50.505050...% that you really are.

#21 Hap with pre tag on 2013-04-12 10:56 (Reply)

[pre] test of pre tag[/pre]

#22 Hap with pre tag on 2013-04-12 11:01 (Reply)

Add Comment

Name
Email	(Your e-mail will not show in your comment)
Homepage
In reply to
Comment	By commenting on this site you agree to the site rules listed here. Email addresses will not be displayed and will only be used for email notifications. Quick reference guide: [i] italics [/i], [b] bold [/b], [u] underline [/u], [url] (web address) [/url]. More tips here. Enclosing asterisks marks text as bold (word), underscore are made via _word_. E-Mail addresses will not be displayed and will only be used for E-Mail notifications. To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly. Enter the string from the spam-prevention image above: BBCode format allowed
	Remember Information? Subscribe to this entry

Maggie's Farm

Our Recent Essays Behind the Front Page

Categories

Quicksearch

Links

Blog Administration

RSS Readers

Wednesday, April 10. 2013

More fun with statistics: A simple math problem for our readers