Reputation: 23
Gelman and Hill are explaining simulation in R. On page 139 they say:
52% of adults in the United States are women and 48% are men. The heights of the men are approximately normally distributed with mean 69.1 inches and standard deviation 2.9 inches; women with mean 63.7 and standard deviation 2.7. Suppose we select 10 adults at random. What can we say about their average height?
sex <- rbinom (10, 1, .52) R code
height <- ifelse (sex==0, rnorm (***10***, 69.1, 2.9), rnorm (***10***, 64.5, 2.7)) #"***"s added for emphasis.
avg.height <- mean (height)
print (avg.height)
I don't understand what the two 10's are doing in rnorm function.
sex is a 10-vector. if sex[1] is 0, ifelse selects ten values for height from normal distribution with mean 69.1 ..., if sex[2] is 1 ifelse selects 10 values for height from normal with mean 64.5, and so on, and only value of sex[10] will determine what will be eventually assigned to height.
Obviously my understanding is incorrect and it is picking the ten values from the right distribution. I changed mean of first normal distribution to 669.1 to understand how it is assigning value to height, and the code above is doing what it is supposed to do. I still do not understand what the two 10's are dong in rnorm functions. When I changed the two 10's to two 1's everything works as it should. Can someone please explain to me how ten values for height are assigned in the above code.
Upvotes: 1
Views: 368
Reputation: 5152
Actually there is a difference when defining the correct number of observations. That is because apparently the ifelse
function first "completes" the data for both options, like a data.frame, then applies the if selection for every row. Since sex
is a 10 element vector, so the 10s are necessary in the rnorm
function for getting the right answer. You can see that substituting the 10s by 1s, passing one or ten element vector to ifelse
and resetting the seed
for every random generation. See below:
> set.seed(12345)
> sex <- rbinom (10, 1, .52)
>
> set.seed(12345)
> ifelse (sex[1]==0, rnorm (1, 69.1, 2.9), rnorm (1, 64.5, 2.7)) #correct
[1] 70.79803
> set.seed(12345)
> ifelse (sex[1]==0, rnorm (10, 69.1, 2.9), rnorm (10, 64.5, 2.7)) #almost true
[1] 70.79803
> set.seed(12345)
> ifelse (sex==0, rnorm (1, 69.1, 2.9), rnorm (1, 64.5, 2.7)) #wrong
[1] 70.79803 70.79803 70.79803 70.79803 66.41556 66.41556 66.41556 66.41556 70.79803
[10] 70.79803
> set.seed(12345)
> ifelse (sex==0, rnorm (10, 69.1, 2.9), rnorm (10, 64.5, 2.7)) #correct
[1] 70.79803 71.15745 68.78302 67.78486 62.47356 66.70563 62.10683 63.60474 68.27594
[10] 66.43397
> # is this what ifelse is doing?
> set.seed(12345)
> da=data.frame(sex, M=rnorm (10, 69.1, 2.9), W=rnorm (10, 64.5, 2.7))
> da$res <- apply(da,1,function(sx)ifelse(sx[1]==0,sx[2],sx[3]))
> da$res
[1] 70.79803 71.15745 68.78302 67.78486 62.47356 66.70563 62.10683 63.60474 68.27594
[10] 66.43397
Upvotes: 2