moseno
moseno

Reputation: 61

Sampled data in R, how to replace randomly selected elements with 0

I would like to repeat all rows in my data set with probability. Probability of repeating value in rows is depending on initial row. How can I determine for every elements in rows chance of being repeated?

Here is a little example data frame.

data <- data.frame(id = rep(c("01", "02", "03"),4), 
                    X1 = c(100,60,90,0,60,90,0,60,0,100,60,0),
                    X2 = c(0,60,90,0,60,0,0,0,90,0,0,90))
head(data)
  id  X1 X2
1 01 100  0
2 02  60 60
3 03  90 90
4 01   0  0
5 02  60 60
6 03  90  0

First column and column with value = 0 should be repeated. Numerical value in row should be repeated with chance 9/10. (I expect new data frame with repeated id column and 0-value elements)

a possible example output:

head(rep)
  id X1 X2
1 01  0  0
2 02 60 60
3 03 90  0
4 01  0  0
5 02  0 60
6 03 90  0

I have trouble with defining argument prob= in sample() for rows.

Any idea?

Upvotes: 0

Views: 819

Answers (1)

josliber
josliber

Reputation: 44299

Basically your question boils down to how to replace randomly selected elements of your data with 0. You can do this pretty simply with runif, in this case replacing each value with 0 with probability 0.1:

set.seed(144)
data[-1] <- sapply(data[-1], function(x) ifelse(runif(length(x)) < 0.1, 0, x))
data
#    id  X1 X2
# 1  01   0  0
# 2  02  60 60
# 3  03  90 90
# 4  01   0  0
# 5  02  60 60
# 6  03  90  0
# 7  01   0  0
# 8  02  60  0
# 9  03   0 90
# 10 01 100  0
# 11 02  60  0
# 12 03   0 90

With this random seed, the only change was in the first row of your example data.

Upvotes: 1

Related Questions