**by Greg Mayer**

So, here’s the answer given by Manil Suri to the puzzle he posed in the *New York Times* on Sunday. First, restating the puzzle:

Four cards are laid in front of you, each of which, it is explained, has a letter on one side and a number on the other. The sides that you see read E, 2, 5 and F. Your task is to turn over only those cards that could decisively prove the truth or falsity of the following rule: “If there is an E on one side, the number on the other side must be a 5.” Which ones do you turn over?

And here is his answer:

Clearly, the E should be turned over, since if the other side is not a 5, the rule is untrue. And the only other card that should be flipped is the 2, since an E on the other side would again disprove the rule. Turning over the 5 or the F doesn’t help, since anything on the other side would be consistent with the rule — but not

proveit to be true.

In the article, Suri points out that this is the Wason problem, which many readers recognized it as (I’d never heard of it). He goes on to point out that studying math improves a person’s ability to answer such problems correctly, ans argues that the Wason problem is an especially good way to teach critical thinking, and its use should be encouraged.

He notes that on average 10% of people get it right. I haven’t done a count, but many readers got it correct in the comments– far more than 10%, I would venture. One thing I learned from the responses is that many readers read the problem as referring to just these 4 cards, and that was useful in finding the answer. It never occurred to me– indeed, I think it *would* never have occurred to me– that the problem referred to just 4 cards. I assumed they were a random sample from a potentially infinite universe of cards. And that, to me, is where the real interest of the problem lies. Different people will read the same puzzle, and think the set up or the question are quite different. This matter of interpretation also occurred to many readers, and it is one to which I will return in a later post.

The Steve Pinker video I posted yesterday is the exact same problem, and Pinker gives Suri’s answer. My apologies to readers who were looking for my reveal yesterday, but Pinker was the reveal. But, as I mentioned above, Suri and Pinker’s answer was not the item of interest to me, and reading through the many comments yesterday made me realize my own interpretation of the problem was one of many, and so I have needed to think through my interpretive analysis further.

## 54 Comments

I’m betting that many WEIT readers (being a well-read and well-educated bunch 🙂 ) had come across the problem before.

I think the logic problem itself was rather easy; I would expect readers of this site to be able to solve it. This would be true regardless of prior knowledge. I would have to ask those who were familiar with the problem, whether they were able to solve it the first time if was offered. I would expect that most could.

What surprises me is the 10% success rate. That figure was presented without citing sources. Is this well documented?

When I watched the Pinker video I suspected the four people shouting out answers were plants, allowing Pinker to make his points. Also in the Pinker video, it was not true that people were better at solving the contextualized problem — there were still many wrong answers shouted out by the audience.

It is on that order. I’d have to look it up again, but it is a very small amount.

But it is *not* necessarily a failure of reasoning or of “domain specificity” like the evolutionary psychologists say.

EPs seem to presuppose we use classical logic. One should ask “why is *that* the standard”, especially as the use of conditionals in classical logic is at least plausibly regarded as odd.

(Cf. _Human Reasoning and Cognitive Science_.)

If you have a larger deck, the problem devolves to the problem of induction, most recently highlighted by the book The Black Swan.

No, it devolves to the problem of hypothetico-deduction.

Wason originally designed the task to test whether people were good Popperians — that is, whether they were good at spotting evidence that falsified a theory (in this case, violated a rule). They were not; they usually looked for evidence that would confirm it. Subsequently evolutionary psychologists showed that people were good at spotting violations of social-contract type rules — that is, at spotting ‘cheats’ who broke rules like ‘If you take the benefit, then you must pay the cost’ by not paying the cost. For example, which cards do you need to turn over to see who broke the rule: “If you borrow the car, then you must fill it with gas”?

– Borrowed car

– Walked

– Filled with gas

– Did not fill with gas

This has the same logical structure as the E, 2, 5 and F example, but people perform much better (~75% correct).

More here: https://www.cep.ucsb.edu/topics/exchange.htm

But see also above – there’s a confound!

There’s also a similar game, demonstrated here by Derek Muller. You can play along.

Yes, my first impression was that the question was ambiguous. Is it just these 4 cards, or are they a sample from a larger population?

My reaction too.

I initially thought they were a sample of a larger population, then I realised that 4 cards could never ‘prove’ a rule for a large population (though they could disprove it).

cr

I didn’t even read it as ambiguous, I just assumed, for no good reason, that the four were part of a larger and undefined set. I then got hung up on the word “prove” which seemed impossible, the opposite obviously being possible

This also surprises me. The problem clearly says 4 cards are presented. It says nothing about there being a deck of cards or a larger universe of cards. I can understand why we would want to apply the problem and solution more broadly, but a larger universe is not indicated in the words of the problem.

I was intrigued by one comment that suggested these four cards are part of a larger group where all letters in the alphabet are on one side, and their corresponding positions on the other which fits with the E must be 5 rule. If we turned over the 2 and found B on the other side, would this explanation present itself to us?

Yes, it’s an obvious possibility, but not one that the 4 cards could prove (for anything except those 4 cards, and you’d have to turn them all over to prove it). They could of course disprove it though.

cr

If stated carefully, they aren’t a sample. But there is still a problem with proposed analyses of people’s answers.

The most successful at solving Wason’s task and similar exercises are people with a smattering of logical training, ie logicians (including those working in informatics), philosophers, and mathematicians. But that’s the case mainly because they are familiar with formal languages – and many other people are not.

As oliverscottcurry pointed out above, the problem at hand is not the logical task itself (ie to use modeus ponens and modus tollens) but the level abstraction in the task’s presentation. As soon as you give Wason’s task to people in a relatable, every day kind of fashion, they are much more successful at it.

Or something else: there’s some evidence that people treat conditionals like one does in a default logic, rather than classical.

Since the question asked about the truth or falsity of “a rule”, I believe the reader is right in expecting a larger (unseen) card set. A rule is an expression of a general principle.

But one nearly always must define the universe to which the rule applies. No rule (not even the rules of physics) can apply to the entire universe (which includes imaginary entities whose behavior is limited only by our imaginations).

The universe here seems clearly stated to me: these four cards. We are asked for a general principle about this limited universe.

To me too, but it doesn’t matter since the answer is the same for a disproof.

Better IMO to recognize one’s error and learn something if one got it wrong than to blame the problem, but that’s not the common reaction here it seems.

I do not see in the question where it says *these* four cards, indicating that they are the universe. It says “Four cards are laid in front of you…”, a statement that conveys no information since I can see that.

If a card shark lays four playing cards before me and asks a tricky question I wouldn’t assume they are the only four cards.

BTW, I did assume they are the only four cards because I recognized it as a logic problem, not an inference problem.

I would think just the opposite. With a group of 4 cards it can be decisively stated whether the rule is true or false – which is what the problem asks us to do. Why complicate matters by positing a larger universe of cards. If the number of cards does increase, we have to turn over more cards to see whether the rule is true of false, but with any given collection of cards, we can still decisively state that the rule is true of false. Yes, problems do occur we do not know the extent of the universe we are looking at.

I found the logic problem itself trivial, but the discussion very interesting. I hope we can look a the original article and talk about math instruction at some point.

If a reader assumes this is four from a larger deck, the question could be rephrased to “which cards are relevant to helping determine if the rule is being followed.” That would not imply that the question can be definitively answered in four cards out of an unknown larger sized deck.

Reblogged this on The Logical Place.

For once that’s appropriate. 😉

I’d point out that the same problem turned up in Daniel Kahneman’s ‘Think Fast And Slow’, along with a whole host of fascinating little puzzles and test questions that highlight our brains’ various blindspots.

My favourite is this one, the ‘Linda Problem’, which was originally thought up by Amos Tversky and Kahneman himself, and subsequently tested on their students:

“Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

On the basis of the preceding description, which of the following is most likely?

1. Linda is a teacher in elementary school.

2. Linda works in a bookstore and takes Yoga classes.

3. Linda is active in the feminist movement.

4. Linda is a psychiatric social worker.

5. Linda is a member of the League of Women voters.

6. Linda is a bank teller.

7. Linda is an insurance salesperson.

8. Linda is a bank teller and is active in the feminist movement.”

If you’ve already answered this question and glanced over the choices you’ll see that only two of the answers are interrelated: answers 6 and 8. These were the answers that Kahneman and Tversky were focused on, because they wanted to see if people realised that it is logically impossible for the description in 8 to be more fitting than the description in 6. The number of specifically _feminist_ bank tellers is necessarily fewer than the number of bank tellers. In a venn diagram, 8 would be subsumed by 6.

Infamously, the answers they got back from students and participants in general were the opposite: they defied logic and decided that 8 was more likely than 6. And the number or respondents who did so was a huge majority. Kahneman and Tversky grew increasingly incredulous, and eventually reduced the choices from all eight of the ones you see above to just the two important ones; they were convinced that people would spot the fallacy if their choice was limited to 6 and 8. Even then however the students chose, by a 85% majority, that 8 was more likely than 6.

I have to say, this was one of the problems in Kahneman’s book that I immediately understood, presumably because I was primed by the book itself to spot certain fallacies before they arose, but I have tried this problem out on so many people and none of them have chosen ‘Linda is a bank teller’ as the most likely description, even after the choices were narrowed down to just 6 and 8. Certain relatives, who I will not name, just didn’t understand what was wrong with their decision even after lengthy explanations. They got quite obstinate actually, folding their arms and repeating their answers with a hint of truculence.

Kahneman’s book as a whole is fascinating(so long as you ignore the strangely incongruous nods to the self help genre that come at the end of each chapter). Almost every psychological tic and hiccup you can think of comes up in the book somewhere, and once you read it you start seeing them everywhere.

(Apologies to the many, many readers here who will know the Linda Problem by heart. But it was only because of a BTL post that I was recommended Kahneman’s book in the first place.)

The Linda problem I use when trying to hammer into students the idea that if you search for a book thus

Audiology for beginners

& get zero hits, adding that it is by Smith will not make it easier to find!

I never cease to be amazed that they immediately think adding more detail will be more likely to find it!

When I explain, the light goes on & they they say, oh yes!

It occurs to me that your answer is indicative of you coming at the problem like a scientist rather than a mathematician. When you hear a ‘rule’, your assumption is that it’s like a law of nature – a hypothesis that is supposed to be generally true, and of which this sample is designed to test the prediction. Whereas a mathematician assumes it only refers to the specific case in front of them.

I’m reminded of the old joke about the sheep that’s black on at least one side.

Well, now you

haveto tell the joke.Wikipedia has it here:

https://en.wikipedia.org/wiki/Mathematical_joke#Stereotypes_of_mathematicians

Sorry – it’s such an old one I assumed everyone would know it. The story goes that an economist, a statistician and a logician are on a train to Scotland. Just after they pass the border, the economist happens to look out of the window and see a black sheep.

“Look!”, she says, “The sheep in Scotland are black”

“No”, the statistician corrects her. “At least some sheep in Scotland are black”.

“No”, says the logician. “In Scotland there exists at least one sheep which is black on at least one side”.

I’d heard it as “an astronomer, a physicist, and a mathematician” …. probably depends who you want to rag

In response to your earlier point regarding interpretation and rules, I know that’s what I did. Never even considered the limited nature of the problem.

Yes, this struck me too. Quite a few readers seemed to interpret the problem this way, which is totally reasonable given more scientific vs. mathematical training. Mathematical training teaches a strict, literal reading of the problem. So if only 4 cards are referred to in the problem, and no larger set is identified, then the problem can only be about the 4 cards.

Whereas we statisticians saw both interpretations because we are half-scientist half-mathematician (and not very good at either).

but at least you can determine that competence and provide confidence intervals

Lol, exactly right!

Labelling the cards E and F makes me thinking them as part of a series of at least six cards, A, B, C, D, E and F. Maybe if they had been described as shapes (circle, triangle, wavy line) and had colours on the other side (blue, red, green) I wouldn’t have assumed these were random cards dealt from a pack.

Yes!

This is an inherent problem with language itself: right answers depend on one’s interpretation of the language. It’s not a problem with the particular question, but with using any language to try to ask anything.

While lots of written (or spoken) texts have fairly common interpretations that one can expect the majority of people to make, if you ask a large enough number of people, and especially (as the comments above noted) people with different backgrounds or cultures, you will see many interpretations are possible and will be made.

I learned this the hard way: for the last five years, I had one thousand students/year. Writing exam questions that were posed to so many people, it is impossible to ask something that some subset would not know what interpretation I wanted.

If you ask enough people, minority interpretations that usually woudn’t pop up, will be made by some. This becomes a very large problem when writing standadized tests that are taken by huge numbers of people: SAT, ACT, GRE, IQ, etc., and it remains wholly unaddressed. Those test scores cannot be interpreted as completely accurate.

See above: the use of “if” is crucial. There are many uses of “if” in natural language, and some of them *might* be like the “if” in a default logic.

I wouldn’t say the issue is wholly unaddressed, but it is often inadequately addressed. The whole discipline of measurement and evaluation is primarily concerned with such problems. Major testing companies (like ACT, ETS, or those behind TOEFL, etc.) are very concerned with measurement error modelling that aims to account for exactly the types of problems you mention here.

There are a variety of theories designed to help tackle these issues: e.g. latent response models, item-response theory, classical test theory. They do a decent job in some circumstances, but there is much that remains to be done (this is an area that I actively work in, so that opinion is of course biased).

In particular, problems arise when one tries to make inferences about an individual from a small test; also, when one isn’t exactly sure what the designed test is actually measuring.

Glad to hear that! Would they be interested in tackling my exams?:)

I’m sure they would be for an exorbitant price! Practically speaking, in the classroom, the best way to minimize these issues is to give many exams or evaluations that are constructed by several different people. Of course, that creates a lot more work for instructors. It’s a very imperfect system.

Out of curiosity: Ed, have you looked into the idea that humans are sort of “default logicians” rather than classical logicians by, if you’ll pardon the expression, default?

Sure, different interpretations of what, exactly, is being asked are possible.

But I do think it should’ve been reasonably obvious that the question applied to only those 4 cards. Otherwise there is no solution. There is no minimum number of cards you can turn over to prove or disprove the rule with an infinite (or even just indefinite) number of cards.

I would say that the ability to see what a person is asking for, despite the technicality of language ambiguities, is part of critical thinking and reasoning. If you can’t reason in the presence of minor ambiguities, welp, you’re fucked. This is an ambiguous world.

No, the ability to sucessfully discern which of many interpretations a speaker means depends greatly on your having extra information about things like what that speaker is like, what his or her background it, and making an educated guess from that.

As I said, when Pinker used the verb “test”, as in “test this proposition”, I truly didn’t know his meaning. In my field, to “test: a hypothesis means something quite different that to “prove” a hypothesis.

I did guess it, from the post saying that Pinker was asking the “same problem” as the NYT writer did. The NYT writer was asking for “decisive proof” of the porposition, so I inferred that Pinker was as well. But, without that extra information, I truly would not have known.

If that question had been a written one, on a multiple choice test, without the context I would not have known the answer.

And yes, life is ambiguous and so is language. That’s why absolute conclusions can’t be drawn from those sorts of questions. Glad to see, from the comment above, that those who write exams for huge numbers of people are trying to minimize the problem.

I don’t disagree that striving for as much clarity as possible is what we should do, especially in the context of administering tests. But as I wrote, I think the puzzle’s meaning was reasonably clear, especially in an informal context. There is at least one (the one I mentioned) good reason to assume the puzzle was about only those four cards.

What meaning of “test” would’ve led to a different interpretation of what the puzzle was asking? What other reasonable interpretation of “test this proposition” is there besides “see if the proposition holds”?

I failed to note that every card has a letter on one side and a number on the other side. Had I paid attention there, I would have gotten it though my broader reasoning was correct.

I did note that. I too, got the answer, but added Erhardt’s Cynical Lemma: If you distrust the given, then you need to turn over the F card to see if there’s an E on the other side. 🙂

I use this ‘card problem’ followed by the identical ‘beer ‘n age’ problem, in a lecture about innate understanding (yes, part of an introduction to evolution).

I think 10% is low, in my audience it is closer to 50%. The audience is not average though, all have tertiary education. However, the solution to the basically identical ‘beer and age’ problem, has close to 100% right anwers. Cosmides and Tooby vindicated?

A very accurate description of an aspect of the postmodern approach: “my own interpretation of the problem was one of many.”

“One thing I learned from the responses is that many readers read the problem as referring to just these 4 cards, and that was useful in finding the answer.”

I’d go further, I’d say that was absolutely essential in finding the answer. Because, if the 4 cards were part of a larger set, while they could potentially disprove the rule, they could never prove it correct. (In which case the only correct answer in terms of the question would be ‘don’t turn over any cards’).

cr

There were two main interpretations of the question which was phrased ambiguously. Interpretation 1, the ‘many cards interpretation’ leads to there being no satisfactory answer. Interpretation 2, the ‘4 card interpretation’ leads to the satisfactory answer of turning over just two cards. Perhaps inadvertently the test could be used to grade people’s logical reasoning thusly:

Incorrect logic without perceiving any ambiguity – 0pts

Correct answer with correct logic pertaining to I2 without perceiving any ambiguity – 5pts

Correct answer (or rather no answer) citing I1 without perceiving any ambiguity – 5pts

Discussion of ambiguity of I1 vs I2 with either throwing up their hands, throwing the card table across the room or refusing to play – 7pts

Perception of ambiguity but reaching the conclusion that only I2 gives a satisfactory answer so must be the correct interpretation – 10pts

I didn’t perceive any ambiguity so under your scoring system I would get 5 points. However, I would argue there is no ambiguity. Any perceived ambiguity is brought to the game by the player only because we are used to decks of cards. Four cards surely must be a subset of a larger deck. Nothing in the problem statement, however, should lead you to believe that, only you experience of card decks does that.

OK…I don’t post my solution yesterday, but that is the answer I came up with.

It’s really a problem in logic rather than mathematics, but then math is really a branch of logic.