noLongerRandom
noLongerRandom

Reputation: 531

Confusion Between 'sample' and 'rbinom' in R

Why are these not equivalent?

#First generate 10 numbers between 0 and .5
set.seed(1)
x <- runif(10, 0, .5)

These are the two statements I'm confused by:

#First    
sample(rep(c(0,1), length(x)), size = 10, prob = c(rbind(1-x,x)), replace = F)
#Second
rbinom(length(x), size = 1, prob=x)

I was originally trying to use 'sample'. What I thought I was doing was generating ten (0,1) pairs, then assigning the probability that each would return either a 0 or a 1.

The second one works and gives me the output I need (trying to run a sim). So I've been able to solve my problem. I'm just curious as to what's going on under the hood with 'sample' so that I can understand R better.

Upvotes: 1

Views: 2291

Answers (2)

Worice
Worice

Reputation: 4037

The difference between the two function is quite simple.

Think of a pack of shuffled cards, and choose a number of cards from it. That is exactly the situation that sample simulates. This code,

> set.seed(123)
> sample(1:40, 5)
[1] 12 31 16 33 34

randomly extract five numbers from the 1:40 vector of numbers.

In your example, you set size = 1. It means you choose only one element from the pool of possible values. If you set size = 10 you will get ten values as you desire.

set.seed(1)
x <- runif(10, 0, .5)
> sample(rep(c(0,1), length(x)), size = 10, prob = c(rbind(1-x,x)), replace = F)
[1] 0 0 0 0 0 0 0 1 0 1

Instead, the goal of the rbinom function is to simulate events where the results are "discrete", such as the flip of a coin. It considers, as parameters, the probability of success on a trial, such as the flip of the coin, according to a given probability of 0.5. Here we simulate 100 flips. If you think that the coin could be stacked in order to favor one specific outcome, we could simulate this behaviour by setting probability equals to 0.8, as in the example below.

> set.seed(123)
> table(rbinom(100, 1, prob = 0.5))
 0  1 
53 47 

> table(rbinom(100, 1, prob = 0.8))
 0  1 
19 81 

Upvotes: 1

IRTFM
IRTFM

Reputation: 263411

The first area of difference is the location of the length of the vector specification in the parameter list. The names size have different meanings in these two functions. (I hadn't thought about that source of confusion before, and I'm sure I have made this error myself many times.)

The random number generators (starting with r and having a distribution suffix) have that choice as the first parameter, whereas sample has it as the second parameter. So the length of the second one is 10 and the length of the first is 1. In sample the draw is from the values in the first argument, while 'size' is the length of the vector to create. In the rbinom function, n is the length of the vector to create, while size is the number of items to hypothetically draw from a theoretical urn having a distribution determined by 'prob'. The result returned is the number of "ones". Try:

rbinom(length(x), size = 10, prob=x)

Regarding the argument to prob: I don't think you need the c().

Upvotes: 3

Related Questions