Reputation: 1270
I wan to generate 300 random data based on the following criteria:
Class value
0 1-8
1 9-11
2 12-14
3 15-16
4 17-20
Logic: when class = 0, I want to get random data between 1-8. Or when class= 1, I want to get random data between 9-11 and so on.
This gives me the following hypothetical table as an example:
Class Value
0 7
0 4
1 10
1 9
1 11
. .
. .
I want to have equal and unequal mixtures in each class
Upvotes: 0
Views: 482
Reputation: 6496
Providing some additional flexibility in the code, so that different probabilities can be used in the sampling, and having the smallest possible amount of hard-coded values:
# load data.table
library(data.table)
# this is the original data
a = structure(list(Class = 0:4, value = c("1-8", "9-11", "12-14",
"15-16", "17-20")), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
# this is to replace "-" by ":", we will use that in a second
a[, value := gsub("\\-", ":", value)]
# this is a vector of EQUAL probabilities
probs = rep(1/a[, uniqueN(Class)], a[, uniqueN(Class)])
# This is a vector of UNEQUAL Probabilities. If wanted, it should be
# uncommented and adjusted manually
# probs = c(0.05, 0.1, 0.2, 0.4, 0.25)
# This is the number of Class samples wanted
numberOfSamples = 300
# This is the working horse
a[sample(.N, numberOfSamples, TRUE, prob = probs), ][,
smpl := apply(.SD,
1,
function(x) sample(eval(parse(text = x)), 1)),
.SDcols = "value"][,
.(Class, smpl)]
What is good about this code?
a
, as I called it)Upvotes: 1
Reputation: 173793
You could do:
df <- data.frame(Class = sample(0:4, 300, TRUE))
df$Value <- sapply(list(1:8, 9:11, 12:14, 15:16, 17:20)[df$Class + 1],
sample, size = 1)
This gives you a data frame with 300 rows and appropriate numbers for each class:
head(df)
#> Class Value
#> 1 0 3
#> 2 1 10
#> 3 4 19
#> 4 2 12
#> 5 4 19
#> 6 1 10
Created on 2022-12-30 with reprex v2.0.2
Upvotes: 2