user2120963
user2120963

Reputation: 141

R Using Sample to Create Column of Matrix with Random numbers

With the help of people on this site I have a matrix y that looks similar to this (but much more simplified).

1,3
1,3
1,3
7,1
8,2
8,2

I have created a third column that generates random numbers (without replacement for each of the repeating chunks using this code j=cbind(y,sample(1:99999,y[,2],replace=FALSE)).

Matrix j looks like this:

1,3,4520
1,3,7980
1,3,950
7,1,2
8,3,4520
8,3,7980
8,3,950

How do I obtain truly random numbers for my third column such that for each of the repeating rows i.e. 3,then 1, then 2 I get a random number that is not replicated within that repeating part (replace = FALSE)?

Upvotes: 2

Views: 3406

Answers (3)

agstudy
agstudy

Reputation: 121568

I can't get this without a loop. Maybe someone else can get more elegant solution. For me the problem is to sample with repetition intra-group and without repetition inter-group

ll <- split(dat, paste(dat$V1,dat$V2,sep=''))
ll.length <- by(dat, paste(dat$V1,dat$V2,sep=''),nrow)
z <- rep(0,nrow(dat))  

SET <- seq(1,100)  ## we can change 100 by 99999 for example
v =1
for (i in seq_along(ll)){
  SET <- SET[is.na(match(z,SET))]
  nn   <- nrow(ll[[i]]) 
  z[v:(v+nn-1)] <- sample(SET,nn,rep=TRUE) 
  v <- v+nn
}

 z
[1]  35  77  94 100  23  59

Upvotes: 1

Jouni Helske
Jouni Helske

Reputation: 6477

This should get you what you want:

j <- cbind(y, unlist(sapply(unique(y[,2]), function(n) sample(1:99999, n))))

edit: There was an error in code. Function unique is of course needed.

Upvotes: 1

Arun
Arun

Reputation: 118789

Why this happens:

The problem is that sample command structure is:

sample(vector of values, how many?, replace = FALSE or TRUE)

here, "how many?" is supposed to be ONE value. Since you provide the whole of the second column of y, it just picks the first value which is 3 and so it reads as:

set.seed(45) # just for reproducibility
sample(1:99999, 3, replace = F)

And for this seed, the values are:

# [1] 63337 31754 24092

And since there are only 3 values are you're binding it to your matrix with 6 rows, it "recycles" the values (meaning, it repeats the values in the same order). So, you get:

#      [,1] [,2]  [,3]
# [1,]    1    3 63337
# [2,]    1    3 31754
# [3,]    1    3 24092
# [4,]    7    1 63337
# [5,]    8    2 31754
# [6,]    8    2 24092

See that the values repeat. For the matrix you've shown, I've no idea how the 7,1,2 occurs. As the first value of your matrix in y[,2] = 3.

What you should do instead:

y <- cbind(y, sample(1:99999, nrow(y), replace = FALSE))

This asks sample to generate nrow(y) = 6 (here) values without replacement. This would generate non-identical values of length 6 and that'll be binded to your matrix y.

Upvotes: 5

Related Questions