bill999
bill999

Reputation: 2529

randomly assign without replacement using numbers

I have a data set with 100 rows and I have a string of 4 values (A, B, C, D) I want to randomly assign to the rows. However, I want to assign A to 30 rows, B to 20 rows, C to 10 rows, and D to 40 rows. How would I go about this?

df <- data.frame(ID=c(1:100))
values <- c("A", "B", "C", "D")

One way I have thought of is to generate a randomly ordered list of numbers 1-100 and assign the first 10 A and so on, but I imagine there would be a much better way to do it than this.

Upvotes: 2

Views: 5228

Answers (1)

Thomas
Thomas

Reputation: 44525

Here are two options. The first one probabilistically assigns values to a column in df. This does not guarantee that there will be exactly 30, 20, 10, and 40 each of A,B,C,D, respectively. Rather, in expectation there will be.

df$values <- sample(values, nrow(df), FALSE, prob = c(.3,.2,.1.,.4))

This second option is probably want you want. It randomly samples rows from the dataframe (essentially shuffling the rows) and uses those as extraction indices (inside []) and then assigns to that shuffled set of rows a vector of values A,B,C,D created using rep to ensure exactly 30, 20, 10, and 40 occurrences of each value, respectively.

df$values[sample(1:nrow(df), nrow(df), FALSE)] <- rep(values, c(30,20,10,40))

Upvotes: 10

Related Questions