Bayesian Bonus

I’m Alan Alda and this is C+V, conversations about connecting and communicating

Sometimes in our conversations we dig deep into things we don’t have time to include in the episode itself but we’d still like to share. Here’s one of those cases. If you listened to the conversation I had with my long-time friend Steven Strogatz, you’ll have realized that my fascination with mathematics is not well matched with my mathematical skills. I’ve tried many times over the years – with Steve’s help – to cudgel some of those skills into my brain, with mixed success. So I thought it would be fun if Steve could get me to grasp one of those math terms we’ve all heard about but that most of us don’t really get. You can judge for yourself if we were successful…

I wanna ask you something that I have often asked you in our conversations like this that we’ve had over the years. Can you help me understand something, anything, about something that I’ve heard about for many years, but don’t really have a clear understanding of?

For instance, the quadrilateral equation is very common in science, and I don’t know how it works. I have always heard about the Fourier transform. What a wonderful name. I have no idea what it does. And Bayesian statistics. How about Bayesian statistics?

Steve: 00:37:29 Great. Sure.

Alan: 00:37:31 Just so you know who you’re talking to, you know what level I’m at, let me tell you what I think it is and then you can work from that.

I think it’s a way of figuring out a problem that’s where something is changing all the time and you keep getting updates on the information you have about it so you can have a more accurate appraisal of it. Something like that.

For instance, I was on an island once with roads and stuff, and the only way you could get off the island was by taking a ferry. And I was visiting with a friend who was very rich, and I was curious about how rich people think. How did he get rich? Did he estimate things better than the rest of us? And we heard that the ferry had gone out of commission on one end of the island. The other end of the island had a ferry that was working, but if you took that ferry, it would take you three or four times longer to get where we both needed to go.

So I looked it up on my iPhone and found out what condition the ferry was in … the one that was broken … and I saw that it was out of commission and I took the long route.

Later I found out he kept checking on the ferry. He didn’t take one report about the ferry, he kept checking on it, and at the last minute, managed to get on the ferry and got home quicker than I did. And I thought he was using some kind of Bayesian method.

Now how close am I understanding anything about this?

Steve: 00:39:12 Yeah, I would say you’re quite close. That the idea that you have some estimate of the odds of something being favorable, or something happening, and then you update the odds as more information comes in … that’s exactly the heart of what Bayesian statistics is. So I think you got the gist of it.

I’d like to try to give you an example of just the kind of thinking that goes into it, actually, because it was instructive to me in my own teaching. It’s something that I was not trained in. I, in fact, never studied probability or statistics as a student. And so, over the years, when I’ve been asked to teach those subjects, I’m a bit terrified.

You know, there’s this old dream … a lot of people have this dream of … an anxiety dream that they’re in a course, some course in school, and they’re signed up for the course, but they didn’t know they were in the course, and they haven’t gone to any of the classes, and they’re sitting there for the final exam and they have no idea what’s on the page of that exam.

When you become a professor, you have a different dream, which is that you dream you have to teach a course that you don’t know the first thing about. And that is the way I feel when they ask me to teach about Bayesian statistics. That I don’t know anything about that.

So the first time I had to teach it, I stayed very close to the textbook. I would really just sort of inch along with the examples in the book and do everything the way the book did, hugging close to shore, ’cause I really didn’t know what I was doing. And on this one Bayesian problem that I used to assign, the students would always do it differently from the book, and I was inclined to think they were doing it wrong, except that they always kept getting the right answer.

Alan: 00:41:07 What was that?

Steve: 00:41:08 This happened year after year, and I started to realize the students had found a more intuitive, clearer way to think about Bayesian reasoning that I wanna do with you now.

Alan: 00:41:17 Oh, okay, great. Okay.

Steve: 00:41:19 Okay? This is an example of how the teacher can learn from the student. And I realize your show is about communication and trying to be clear and vivid, and I think this was a nice example of communication going in the other direction, of the students teaching the teacher that they found the right way to think about this, better than the textbook.

So here’s an example of the kind of thing that they would do. Now, the example I’m gonna give you has a lot of numbers in it, so I worry about that, but you could write them down, or I could just keep repeating them.

Alan: 00:41:51 I haven’t got a pencil and probably the listeners don’t, either, so—

Steve: 00:41:55 I’ll just keep repeating [crosstalk 00:41:55]

Alan: 00:41:55 I’ll just make sure I’m up to date with you.

Steve: 00:41:58 It’s not really important. You’ll get the gist of it anyway.

Alan: 00:42:01 One second. Let’s both clear our throat.

Steve: 00:42:06 All right.

Alan: 00:42:12 Okay, so how is this going to work?

Steve: 00:42:15 Okay, here it is. It’s a question that has to do with the real world … it’s sort of a grim topic, but it’s important. You’ll see why Bayesian reasoning is so important.

The thing I’m gonna give you here is an example that was used in a study of how doctors do or do not successfully use Bayesian reasoning. This is a study of practicing physicians and the psychology of how difficult these kinds of questions are.

Imagine a woman who goes in for her first mammogram. You know, you’re supposed to get a mammogram, they used to say at age 40, for breast cancer screening, or nowadays they say go in at age 50. But anyway, so imagine our hypothetical woman who goes in to have a mammogram, and she’s in what a doctor would consider a low-risk group. There’s no history of breast cancer in her family, she doesn’t have any symptoms. She’s reasonably young, there’s no reason to be worried.

So here’s the first number I wanna give you. For a woman of this type … a low-risk person … her odds, according to the doctors, the probability that a woman like this would have breast cancer is 0.8%.

Alan: 00:43:16 0.8.

Steve: 00:43:18 It’s already confusing, right?

Alan: 00:43:20 Less than 1%.

Steve: 00:43:21 Yeah, less than 1%.

The thing I’m gonna give you here is an example that was used in a study of how doctors do or do not successfully use Bayesian reasoning. This is a study of practicing physicians and the psychology of how difficult these kinds of questions are.

Let me just read to you. It’ll be a bit to take, but we can go over it.

Okay, this was the question:

The probability that one of these women has breast cancer is 0.8%. That’s fact number one.

Second fact: you’re just told … and you can just accept these numbers … if a woman has breast cancer, the probability is 90% that she will have a positive mammogram.

Alan: 00:44:04 Positive meaning she’s got cancer.

Steve: 00:44:06 Yeah, well … positive meaning that the mammogram says that she does.

Alan: 00:44:10 Yeah. Okay.

Steve: 00:44:11 We don’t know that she really does, we just know the mammogram says that she does.

Alan: 00:44:14 I see.

Steve: 00:44:16 So if she does have breast cancer, the mammogram will pick it up 90% of the time. That is, it will say, 90% of the time, “you have breast cancer” when you really do.

Alan: 00:44:25 Uh huh.

Steve: 00:44:27 It’ll miss some. It’ll miss 10% of them. But 90%, when you have it, it’ll say you have it.

Now, the thing that’s making it further confusing is if a woman does not have breast cancer, the probability is known to be 7% that she will still test positive. In other words, a false positive.

Alan: 00:44:46 Is there anything in there that the person listening gets dizzy at this point?

Steve: 00:44:49 Oh, very dizzy. And so do the doctors. So do professionals. That’s the amazing thing.

So the question is … and this is the real question … imagine you’re a woman that has gone in for her first test, she’s with her physician, the physician has all those confusing numbers I just gave you, and unfortunately, her test comes back positive. The question is, how bad is this news? What is the probability she actually has breast cancer?

Alan: 00:45:14 Oh. That’s an interesting … and Bayesian reasoning can give you a better picture of it?

Steve: 00:45:21 I’m gonna give you the answer in a minute, but I’m gonna show you … I’m glad that you find it confusing, because so do the doctors. I mean, even professionals who have done this for decades cannot put all of these numbers together in a way that makes sense.

So the guy who did this study is a German psychologist named [Gerd Gigerenzer 00:45:40] and Gigerenzer describes what happened when he … he said the first doctor he tested was a department chief at a university teaching hospital who had been teaching for more than 30 years. And this guy, he says, was visibly nervous trying to figure out what he would tell the woman. He just eventually gave up. He just said, “I don’t know, I mean, ask my daughter. She’s studying medicine.”

And it wasn’t just this one guy. Gigerenzer asked 24 other doctors the same question, and some of them said the woman’s odds were 1% that she had breast cancer, other doctors said 90% chance that she had it, and there were people everywhere in between. 50%, 80%. So you could imagine a poor woman asking a second opinion and a third opinion, and some doctors would be saying, “It’s very likely you have it, I’m very sorry to tell you.” And others would say, “Don’t worry.” So what’s the right way to think about it?

Alan: 00:46:37 Quickly review the numbers again?

Steve: 00:46:39 Yeah, let’s go over them again.

So first of all, she’s supposed to be in a low-risk group—

Alan: 00:46:44 Which means that she’s got less than 1%.

Steve: 00:46:46 Less than 1%, so that’s like when you said to me, at the beginning, that Bayesian reasoning is about updating information as new information comes in. What you should think of is, before the woman goes in for the test, she knows that her odds are good. Less than 1% chance of having breast cancer, just by virtue of her age and no family history.

Okay, so her odds, before she does any test, are good. Less than 1% chance of trouble.

Then, new information comes in, the new information being she has just tested positive for cancer. So now we have to update her odds, based on two things that we know, which is that sometimes the test is wrong in one direction, and sometimes it’s wrong in the other. That is, it can give either false positives or false negatives.

Alan: 00:47:37 And what’s the difference between those?

Steve: 00:47:40 A false positive would mean she doesn’t have it, but the test says she does.

Alan: 00:47:43 Does she get as many false positives as false negatives?

Steve: 00:47:53 Well, the numbers that we were given were that … sorry, now I’ve gotta look at my own piece of paper here … no laughing matter, it’s important. This poor hypothetical woman.

Yeah, sorry. So we said that if she does have breast cancer, 90% of the time, the test will pick that up and say that she has it. But that if she does not, the test will still say she has it 7% of the time. So the information I’ve given you is that there’s a 7% chance of a false positive. That she tested positive even though she doesn’t have it.

I also gave you information about the sensitivity of the test. That it picks up cancer 90% of the time that it’s there. I didn’t give you information about the false negative. I guess I have indirectly given it to you, which is that if she does have breast cancer, 10% of the time, the test will miss it.

Alan: 00:48:53 So it sounds like the two figures that are important are the difference between the false positives and false negatives. The fact that her past history puts her at less than 1% seems irrelevant, once she takes the test—

Steve: 00:49:08 No, it’s really relevant. No, it’s very relevant.

Alan: 00:49:11 So tell me why. That’s interesting.

Steve: 00:49:14 It’s interesting that you have that intuition. That’s one of the big lessons of Bayesian thinking is that the base rate … that is, in this case, the rate that a person like her probably doesn’t have cancer … really important to know. Keep it strongly in your mind. She’s from a low-risk group. The odds are really good she doesn’t have cancer. There’s no reason to think she would. And just because the test says she does, you still shouldn’t necessarily believe it.

So here’s what my students figured out. This thing that I just gave you, totally bewildering. I mean, if you’re totally confused at this point, that’s the right reaction. And it’s because … and this is one of Gigerenzer’s big insights in his study … people, including doctors, don’t know how to think about probabilities as probabilities. I gave you all these percentages and they’re super confusing when I say it like that.

If I gave it to you as numbers, like the numbers that you learned about in elementary school, you’ll be able to do the problem easily.

Alan: 00:50:10 So go ahead.

Steve: 00:50:12 It’s numbers that we should be thinking about, not percentages.

Okay, so here’s the good way to think of it, and this is what my students would do. And this is what I didn’t realize. The book doesn’t do it this way! This is what the students hit by being smart little kids.

They just would think about a group of 1000 women. Okay, so let’s do that this way. Let me give you the same numbers that I gave you, except not as percentages but as actual numbers.

Instead of saying .8%, which is already bewildering, I’m gonna tell you that 8 out of every 1000 women have breast cancer … that are women like this hypothetical woman in the low-risk group. That’s what .8% means, 8 out of 1000.

Alan: 00:50:51 8 out of 1000 of low-risk women will have it, in spite of the fact that they’re low-risk.

Steve: 00:50:58 Right. Exactly. Perfect so far.

So 8 out of 1000. So we’re gonna imagine this hypothetical cohort of 1000. And of these 1000, 8 of them, unfortunately, do have it.

Now, of these 8, 7 will test positive on their mammogram. Why did I say 7? ‘Cause I told you earlier that the test would pick it up 90% of the time. So of these 8 that actually have it, if I do 90% of 8, that would be 7.2. So I’m just rounding to make it simpler to keep in our head, of these 8, 7 will test positive.

Alan: 00:51:34 Okay.

Steve: 00:51:34 Because the test is pretty good! It’s gonna catch it 90% of the time.

Alan: 00:51:37 Yeah.

Steve: 00:51:37 So of the 8 out of 1000, 7 will test positive.

Okay, now, what we also have to realize is that there’s 992 remaining women, because 8 of them do have breast cancer. 992 … that’s the rest of the 1000 … do not have breast cancer. ‘Cause remember, they’re low-risk. 992 out of 1000 actually don’t have it, yet 70 of them will test positive. Because we said earlier that the test will give you a false positive 7% of the time, and 7% of these 992 women is about 70 women.

In other words, what I’m saying is, 70 women will test positive even though they don’t have it, and 7 will test positive because they really do have it.

I’m sorry, is that too much to do over the … should I say it again?

Alan: 00:52:27 I’m not sure I’m still awake.

Steve: 00:52:29 I’m sorry.

Alan: 00:52:31 No, I’m trying to follow you, but when you started putting numbers onto percentages, it got hard.

Steve: 00:52:39 Okay, so I’ll say it simpler. Out of the 1000, 8 of them do have breast cancer. 7 will test positive. 70 will also test positive, even though they don’t have it. Because the test makes mistakes.

Alan: 00:52:57 Yeah.

Steve: 00:52:58 And so 7 out of the total that tested positive … which is 7 + 70 … 7 out of 77 of those that tested positive will actually have it.

So the bottom line is, only 1 out of 11 that test positive actually have cancer. The odds are 9% for our poor woman.

Alan: 00:53:19 If they come in with low risk.

Steve: 00:53:22 They’re low-risk and they test positive, the odds are still only 9% that they have it. I mean, her odds went up, but we updated the—

Alan: 00:53:22 The odds went up, but if they’re still—

Steve: 00:53:33 They’re still not that terrible.

Alan: 00:53:34 It’s not a—

Steve: 00:53:34 I mean, it’s not like she has it. She might have it.

Alan: 00:53:37 Not as horrible news as it—

Steve: 00:53:38 It’s not as horrible as it would sound.

So okay, that was difficult to do in the medium that we’re using here, but … I guess what I was hoping to convey was that … first of all, that my students taught me the right way to think about this. That percentages are confusing, whereas imaging sort of a tangible population of 1000 women, then you could just calculate things. It was much more straightforward, and that was the way they would do it. And the thing that we just did, painfully and confusingly, was an example of so-called Bayes theorem. We just did Bayesian reasoning.

Alan: 00:54:15 Who was Bayes and how did he arrive at this? Was it a practical problem he was working on?

Steve: 00:54:21 Honestly, I don’t know much about him. They always call him Reverend Thomas Bayes, so—

Alan: 00:54:26 So maybe he was figuring out the collection plate as it went down the [crosstalk 00:54:29]

Steve: 00:54:29 I don’t know what his “Reverend” had to do with anything, but yeah, he’s Reverend Thomas Bayes. I’m not sure, I think he’s some time in the 1800s and I don’t actually know why he was driven to think about all this.

Well, I’m not sure it’s fair to say that we just did Bayesian reasoning. But it’s nice to know that even famed math professor Steven Strogatz had to rely on his students to really grasp it himself. And if you made it this far be sure to check out our longer conversation, where the going is a little easier…

(My thanks the sponsors of this episode. All the income from the ads you hear go to the Center for Communicating Science at Stony Brook University. Just by listening to this podcast, you’re contributing to the better communication of science. So, thank you.)

This episode was produced by Graham Chedd with help from our associate producer, Sarah Chase. Our sound engineer is Dan Dzula, our Tech Guru is Allison Coston, our publicist is Sarah Hill. You can subscribe to our podcast for free at Apple Podcasts, Stitcher or wherever you listen. For more details about Clear + Vivid, and to sign up for my newsletter, please visit alanalada.com. You can also find us on Facebook and Instagram at “Clear and Vivid” and I’m on Twitter @alanalda.

Thanks for listening. Bye bye!