Reputation: 141
With the help of people on this site I have a matrix y
that looks similar to this (but much more simplified).
1,3
1,3
1,3
7,1
8,2
8,2
I have created a third column that generates random numbers (without replacement for each of the repeating chunks using this code j=cbind(y,sample(1:99999,y[,2],replace=FALSE))
.
Matrix j
looks like this:
1,3,4520
1,3,7980
1,3,950
7,1,2
8,3,4520
8,3,7980
8,3,950
How do I obtain truly random numbers for my third column such that for each of the repeating rows i.e. 3,then 1, then 2 I get a random number that is not replicated within that repeating part (replace = FALSE
)?
Upvotes: 2
Views: 3406
Reputation: 121568
I can't get this without a loop. Maybe someone else can get more elegant solution. For me the problem is to sample with repetition intra-group and without repetition inter-group
ll <- split(dat, paste(dat$V1,dat$V2,sep=''))
ll.length <- by(dat, paste(dat$V1,dat$V2,sep=''),nrow)
z <- rep(0,nrow(dat))
SET <- seq(1,100) ## we can change 100 by 99999 for example
v =1
for (i in seq_along(ll)){
SET <- SET[is.na(match(z,SET))]
nn <- nrow(ll[[i]])
z[v:(v+nn-1)] <- sample(SET,nn,rep=TRUE)
v <- v+nn
}
z
[1] 35 77 94 100 23 59
Upvotes: 1
Reputation: 6477
This should get you what you want:
j <- cbind(y, unlist(sapply(unique(y[,2]), function(n) sample(1:99999, n))))
edit: There was an error in code. Function unique
is of course needed.
Upvotes: 1
Reputation: 118789
Why this happens:
The problem is that sample
command structure is:
sample(vector of values, how many?, replace = FALSE or TRUE)
here, "how many?" is supposed to be ONE value. Since you provide the whole of the second column of y
, it just picks the first value which is 3
and so it reads as:
set.seed(45) # just for reproducibility
sample(1:99999, 3, replace = F)
And for this seed, the values are:
# [1] 63337 31754 24092
And since there are only 3 values are you're binding it to your matrix with 6 rows, it "recycles" the values (meaning, it repeats the values in the same order). So, you get:
# [,1] [,2] [,3]
# [1,] 1 3 63337
# [2,] 1 3 31754
# [3,] 1 3 24092
# [4,] 7 1 63337
# [5,] 8 2 31754
# [6,] 8 2 24092
See that the values repeat. For the matrix you've shown, I've no idea how the 7,1,2
occurs. As the first value of your matrix in y[,2] = 3
.
What you should do instead:
y <- cbind(y, sample(1:99999, nrow(y), replace = FALSE))
This asks sample
to generate nrow(y) = 6
(here) values without replacement. This would generate non-identical values of length 6 and that'll be binded to your matrix y
.
Upvotes: 5