Andican
Andican

Reputation: 13

Why is R's which function not returning "correct" answer

I'm writing a variant of the Monty Hall problem, building up on another person's code. The difference is that instead of 3 doors, I have "n" doors. Let's say n = 4 for this question. The doors are labeled A, B, C and D.

The code is as follows:

n <- 4
doors <- LETTERS[seq( from = 1, to = n )]
xdata = c()
for(i in 1:10000) {
    prize <- sample(doors)[1]
    pick  <- sample(doors)[1]
    open1 <- doors[which(doors != pick & doors != prize)]
    open  <- sample(open1,n-2)

    # the line with the problem
    switchyes <- doors[which( doors != open & doors != pick)]

    if(pick==prize) {
        xdata <- c(xdata, "noswitchwin")
    }
    if(switchyes==prize) {
        xdata=c(xdata, "switchwin")
    }
}

When I run the code, I get the warning:

There were 50 or more warnings (use warnings() to see the first 50)

The problem seems to be due to the line:

switchyes <- doors[which( doors != open & doors != pick)]

This should only return 1 item (C) since the statement doors != open and doors != pick eliminates doors A and B and D. However, I'm getting more than one, B and C. Anybody see what's going on?

length(which(xdata == "switchwin"))
# [1] 4728
length(which(xdata == "noswitchwin"))
# [1] 2424
switchyes
# [1] "B" "C"
open
# [1] "B" "D"
open1
# [1] "B" "D"
pick
# [1] "A"
prize
# [1] "C"

Upvotes: 0

Views: 144

Answers (1)

Arun
Arun

Reputation: 118839

The problem you have is the usage of != when LHS and RHS size differ:

p <- letters[1:4] 
# [1] "a" "b" "c" "d"

q <- c("a", "e", "d", "d")
# [1] "a" "e" "d" "d"

p == q
# [1]  TRUE FALSE FALSE  TRUE

p != q
# [1] FALSE  TRUE  TRUE FALSE

What is happening? since p and q are of equal size, each element of p is compared to the value at the corresponding index of q. Now, what if we change q to this:

q <- c("b", "d")

p == q
# [1] FALSE FALSE FALSE  TRUE

What's happening here? Since the length of q (RHS) is not equal to p (LHS), q gets recycled to get to the length of p. That is,

# p    q  p    q
  a == b, b == d # first two comparisons
  c == b, d == d # recycled comparisons

Instead you should use

!(doors %in% open) & !(doors %in% pick). 

Also, by noting that !A AND !B = !(A OR B). So, you could rewrite this as

!(doors %in% open | doors %in% pick)

In turn, this could be simplified to use only one %in% as:

!(doors %in% c(open, pick))

Further, you could create a function using Negate, say %nin% (corresponding to !(x %in% y) and replace the ! and %in% in the above statement as follows:

`%nin%` <- Negate(`%in%`)
doors %nin% c(open, pick) # note the %nin% here

So basically your statement assigning to switchyes could read just:

# using %bin% after defining the function
switchyes <- doors[doors %nin% c(open, pick)]

You don't need to use which here as you are not looking for indices. You can directly use the logicals here to get the result. Hope this helps.

Upvotes: 2

Related Questions