Victor
Victor

Reputation: 23

R: How does vectorized ifelse work in the context of multi-value yes/no parameters?

Gelman and Hill are explaining simulation in R. On page 139 they say:

52% of adults in the United States are women and 48% are men. The heights of the men are approximately normally distributed with mean 69.1 inches and standard deviation 2.9 inches; women with mean 63.7 and standard deviation 2.7. Suppose we select 10 adults at random. What can we say about their average height?

sex <- rbinom (10, 1, .52) R code
height <- ifelse (sex==0, rnorm (***10***, 69.1, 2.9), rnorm (***10***, 64.5, 2.7))     #"***"s added for emphasis.
avg.height <- mean (height)
print (avg.height)

I don't understand what the two 10's are doing in rnorm function.

sex is a 10-vector. if sex[1] is 0, ifelse selects ten values for height from normal distribution with mean 69.1 ..., if sex[2] is 1 ifelse selects 10 values for height from normal with mean 64.5, and so on, and only value of sex[10] will determine what will be eventually assigned to height.

Obviously my understanding is incorrect and it is picking the ten values from the right distribution. I changed mean of first normal distribution to 669.1 to understand how it is assigning value to height, and the code above is doing what it is supposed to do. I still do not understand what the two 10's are dong in rnorm functions. When I changed the two 10's to two 1's everything works as it should. Can someone please explain to me how ten values for height are assigned in the above code.

Upvotes: 1

Views: 368

Answers (1)

Robert
Robert

Reputation: 5152

Actually there is a difference when defining the correct number of observations. That is because apparently the ifelse function first "completes" the data for both options, like a data.frame, then applies the if selection for every row. Since sex is a 10 element vector, so the 10s are necessary in the rnorm function for getting the right answer. You can see that substituting the 10s by 1s, passing one or ten element vector to ifelse and resetting the seed for every random generation. See below:

> set.seed(12345)
> sex <- rbinom (10, 1, .52) 
> 
> set.seed(12345)
> ifelse (sex[1]==0, rnorm (1, 69.1, 2.9), rnorm (1, 64.5, 2.7)) #correct
[1] 70.79803
> set.seed(12345)
> ifelse (sex[1]==0, rnorm (10, 69.1, 2.9), rnorm (10, 64.5, 2.7)) #almost true
[1] 70.79803
> set.seed(12345)
> ifelse (sex==0, rnorm (1, 69.1, 2.9), rnorm (1, 64.5, 2.7)) #wrong
 [1] 70.79803 70.79803 70.79803 70.79803 66.41556 66.41556 66.41556 66.41556 70.79803
[10] 70.79803
> set.seed(12345)
> ifelse (sex==0, rnorm (10, 69.1, 2.9), rnorm (10, 64.5, 2.7)) #correct
 [1] 70.79803 71.15745 68.78302 67.78486 62.47356 66.70563 62.10683 63.60474 68.27594
[10] 66.43397
> # is this what ifelse is doing?
> set.seed(12345)
> da=data.frame(sex, M=rnorm (10, 69.1, 2.9), W=rnorm (10, 64.5, 2.7))
> da$res <- apply(da,1,function(sx)ifelse(sx[1]==0,sx[2],sx[3]))
> da$res
 [1] 70.79803 71.15745 68.78302 67.78486 62.47356 66.70563 62.10683 63.60474 68.27594
[10] 66.43397

Upvotes: 2

Related Questions