Reputation: 61
I would like to repeat all rows in my data set with probability. Probability of repeating value in rows is depending on initial row. How can I determine for every elements in rows chance of being repeated?
Here is a little example data frame.
data <- data.frame(id = rep(c("01", "02", "03"),4),
X1 = c(100,60,90,0,60,90,0,60,0,100,60,0),
X2 = c(0,60,90,0,60,0,0,0,90,0,0,90))
head(data)
id X1 X2
1 01 100 0
2 02 60 60
3 03 90 90
4 01 0 0
5 02 60 60
6 03 90 0
First column and column with value = 0 should be repeated.
Numerical value in row should be repeated with chance 9/10. (I expect new data frame with repeated id
column and 0-value elements)
a possible example output:
head(rep)
id X1 X2
1 01 0 0
2 02 60 60
3 03 90 0
4 01 0 0
5 02 0 60
6 03 90 0
I have trouble with defining argument prob=
in sample()
for rows.
Any idea?
Upvotes: 0
Views: 819
Reputation: 44299
Basically your question boils down to how to replace randomly selected elements of your data with 0. You can do this pretty simply with runif
, in this case replacing each value with 0 with probability 0.1:
set.seed(144)
data[-1] <- sapply(data[-1], function(x) ifelse(runif(length(x)) < 0.1, 0, x))
data
# id X1 X2
# 1 01 0 0
# 2 02 60 60
# 3 03 90 90
# 4 01 0 0
# 5 02 60 60
# 6 03 90 0
# 7 01 0 0
# 8 02 60 0
# 9 03 0 90
# 10 01 100 0
# 11 02 60 0
# 12 03 0 90
With this random seed, the only change was in the first row of your example data.
Upvotes: 1