Reputation: 1012
I have a data frame such as
df <- data.frame(matrix(rnorm(40), nrow=20))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=5)
df$score <- rep(c(1,2,3,5), each = 5)
I want to sample the rows based on two columns color
and score
into two data frames such that I get an almost equal number of rows from each group in each data frame. For example, I have 5 rows with the color blue and score 1. I want 2 in one data frame and 3 in another data frame. If I have sis rows in a group 3 should go to one data frame and 3 to another.
Upvotes: 0
Views: 582
Reputation: 34291
If I've understood correctly, you can try something like:
set.seed(10)
df <- data.frame(matrix(rnorm(40), nrow=20))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=5)
df$score <- rep(c(1,2,3,5), each = 5)
library(dplyr)
df %>%
group_by(color, score) %>%
mutate(grp = sample(seq_along(score) %% 2)) %>%
group_by(grp) %>%
group_split()
[[1]]
# A tibble: 8 x 5
X1 X2 color score grp
<dbl> <dbl> <chr> <dbl> <dbl>
1 0.675 0.257 blue 1 0
2 -0.548 0.365 blue 1 0
3 -1.89 0.851 red 2 0
4 1.09 -0.173 red 2 0
5 1.65 -0.500 yellow 3 0
6 -0.186 0.564 yellow 3 0
7 -0.208 -1.70 pink 5 0
8 0.661 0.447 pink 5 0
[[2]]
# A tibble: 12 x 5
X1 X2 color score grp
<dbl> <dbl> <chr> <dbl> <dbl>
1 0.0555 2.12 blue 1 1
2 -0.738 -0.843 blue 1 1
3 0.833 -0.939 blue 1 1
4 -1.57 -0.172 red 2 1
5 1.43 0.767 red 2 1
6 1.14 1.32 red 2 1
7 1.01 0.997 yellow 3 1
8 -1.20 -0.357 yellow 3 1
9 0.474 -0.0911 yellow 3 1
10 -2.44 0.765 pink 5 1
11 1.15 0.463 pink 5 1
12 -0.426 1.53 pink 5 1
Upvotes: 1