Reputation: 75
I have a dataframe that looks like this:
x y location
21 10 ny
12 22 ny
32 90 cha
33 14 cha
...
I want to randomly sample the rows of x
and y
columns based on percentages. I want 30% of the rows of x
and y
to be randomly assigned group1
and 70% to be randomly assigned group2
. Something like this:
x y location group
21 10 ny group1
12 22 ny group2
32 90 cha group2
33 14 cha group2
...
I think I can do this with mutate()
but I don't know how to write such code. Thank you for your help.
Upvotes: 0
Views: 621
Reputation: 887391
We could use base R
df <- transform(df, group = sample(c("group1", "group2"), nrow(df),
replace = TRUE, prob = c(0.3, 0.7)))
Upvotes: 1
Reputation: 389065
You can use sample
and assign the probability of occurrence of group using the prob
argument.
library(dplyr)
df <- df %>%
mutate(group = sample(c('group1', 'group2'), n(),
replace = TRUE, prob = c(0.3, 0.7)))
Since sample
uses probability if you have 100 rows in df
not necessarily exact 70 rows would always be assigned to 'group2'
. As the number of rows increase this probability would take you closer to 70%.
If you want exact 70%-30% partition use rep
instead.
n <- round(nrow(df) * 0.7)
df <- df %>% mutate(group = sample(rep(c('group1', 'group2'), c(n() - n, n))))
Upvotes: 1