Reputation: 954
I want to randomize one factor and another factor should be randomized within the first factor. How do I do that?
id <- rep(c(10,20,30), each=3)
visit <- rep(1:3,3)
df <- data.frame(id, visit)
df
id visit
1 10 1
2 10 2
3 10 3
4 20 1
5 20 2
6 20 3
7 30 1
8 30 2
9 30 3
it could for example look like this: id visit
1 20 1
2 20 3
3 20 2
4 30 3
5 30 2
6 30 1
7 10 1
8 10 2
9 10 3
Here is code to randomise each id, but I dont know how to put this in a function and then also randomise the second column.
uniq <- unique(df[,1]) %>% sample()
Upvotes: 1
Views: 45
Reputation: 72758
You could sample
by unique
id, using sapply
.
set.seed(42)
dat$visit <- unlist(lapply(unique(dat$id), function(i) sample(dat$visit[dat$id == i])))
dat
# id visit
# 1 10 2
# 2 10 1
# 3 10 3
# 4 20 3
# 5 20 1
# 6 20 2
# 7 30 3
# 8 30 1
# 9 30 2
Edit: To sample also the order of the IDs, you could sample the rows afterwards, dat[sample(nrow(dat)), ]
. Or all combined in a transform()
:
set.seed(42)
transform(dat,
visit=unlist(lapply(unique(dat$id), function(i)
sample(dat$visit[dat$id == i]))))[sample(nrow(dat)), ]
# id visit
# 8 30 3
# 7 30 2
# 4 20 1
# 1 10 1
# 5 20 2
# 2 10 3
# 9 30 1
# 3 10 2
# 6 20 3
To sample the id ranges with sampled visits, you could use a by
approach.
set.seed(42)
do.call(rbind, by(dat, dat$id, function(x) {
transform(x, visit=sample(visit))
})[sample(seq(unique(dat$id)))])
# id visit
# 30.7 30 2
# 30.8 30 3
# 30.9 30 1
# 20.4 20 1
# 20.5 20 2
# 20.6 20 3
# 10.1 10 1
# 10.2 10 3
# 10.3 10 2
Explanation: The by
splits the data at "id"
s into a list of data frames, that can be transform
ed as above, and after sample
ing the order rbind
ed into the resulting data frame.
Data:
(dat <- expand.grid(visit=1:3, id=(1:3)*10)[2:1])
# id visit
# 1 10 1
# 2 10 2
# 3 10 3
# 4 20 1
# 5 20 2
# 6 20 3
# 7 30 1
# 8 30 2
# 9 30 3
Upvotes: 1