Reputation: 107
I have a dataframe of several columns, say it looks like the following:
ID percent region
1 5 1
1 8 2
1 10 3
1 100 4
2 20 1
2 6 2
2 9 3
2 1 4
3 9 1
3 78 2
3 56 3
3 99 4
4 1 1
4 1 2
4 8 3
I need to randomize the "percent" column of the dataset, but the values (and order of the values) need to be the same within the individual's (given by ID) block. The "region" and any other column remains as is, and only "percent" should be randomized. An example can be the following:
ID percent region
2 20 1
2 6 2
2 9 3
2 1 4
4 1 1
4 1 2
4 8 3
1 5 1
1 8 2
1 10 3
1 100 4
3 9 1
3 78 2
3 56 3
3 99 4
Note that the order of values within IDs of "percent" remains the same.
Upvotes: 2
Views: 154
Reputation: 887501
We can get the distinct
'ID', sample
on it, extract the subset of dataset by comparing with each sample
d 'ID" and bind it together (map_df
)
library(tidyverse)
df1 %>%
distinct(ID) %>%
pull(ID) %>%
sample %>%
map_df(~ df1 %>% filter(ID == .x))
Or a faster option would be to split
by 'ID', then rearrange the list
elements by sample
ing on the names
of the list
and bind the rows (bind_rows
)
df1 %>%
split(.$ID) %>%
.[sample(names(.))] %>%
bind_rows
Or we can use base R
by using the same methodology as above
lst <- split(df1, df1$ID)
df2 <- do.call(rbind, lst[sample(names(lst))])
row.names(df2) <- NULL
df2
# ID percent region
#1 4 1 1
#2 4 1 2
#3 4 8 3
#4 3 9 1
#5 3 78 2
#6 3 56 3
#7 3 99 4
#8 2 20 1
#9 2 6 2
#10 2 9 3
#11 2 1 4
#12 1 5 1
#13 1 8 2
#14 1 10 3
#15 1 100 4
Upvotes: 2