Reputation: 2529
I have a data set with 100 rows and I have a string of 4 values (A
, B
, C
, D
) I want to randomly assign to the rows. However, I want to assign A
to 30 rows, B
to 20 rows, C
to 10 rows, and D
to 40 rows. How would I go about this?
df <- data.frame(ID=c(1:100))
values <- c("A", "B", "C", "D")
One way I have thought of is to generate a randomly ordered list of numbers 1-100 and assign the first 10 A
and so on, but I imagine there would be a much better way to do it than this.
Upvotes: 2
Views: 5228
Reputation: 44525
Here are two options. The first one probabilistically assigns values to a column in df
. This does not guarantee that there will be exactly 30, 20, 10, and 40 each of A,B,C,D, respectively. Rather, in expectation there will be.
df$values <- sample(values, nrow(df), FALSE, prob = c(.3,.2,.1.,.4))
This second option is probably want you want. It randomly samples rows from the dataframe (essentially shuffling the rows) and uses those as extraction indices (inside []
) and then assigns to that shuffled set of rows a vector of values A,B,C,D created using rep
to ensure exactly 30, 20, 10, and 40 occurrences of each value, respectively.
df$values[sample(1:nrow(df), nrow(df), FALSE)] <- rep(values, c(30,20,10,40))
Upvotes: 10