Reputation: 8107
I'm trying to go through each row of my data frame, randomly select half of the variables, and set the variable for that particular row to NA
.
For example, with the mydf
dataset below, I'd like for my first row to randomly select 3 variables (say QB
, QE
, QF
) and set their scores to NA
, then again for the 2nd row (say QA
, QD
, QE
) and so forth:
library(tibble)
mydf <- tibble(QA = rnorm(100),
QB = rnorm(100),
QC = rnorm(100),
QD = rnorm(100),
QE = rnorm(100),
QF = rnorm(100))
My attempt, but it doesn't appear to do anything:
vars <- names(mydf)
for (i in nrow(mydf)){
miss_vars <- sample(vars, 3)
for (j in miss_vars) {
mydf[i,j] <- NA
#mydf[i,][[j]] <- NA #Also tried this.
}
}
Upvotes: 1
Views: 534
Reputation: 12937
Try this vectorized:
m <- as.matrix(mydf)
n <- 3 # number of randoms to be selected
inds <- cbind(rep(1:nrow(mydf), each=n), c(replicate(nrow(mydf), sample(ncol(mydf), n))))
m[inds] <- NA
res <- as.data.frame(m)
Here is how:
inds
in which each row and corresponding random column for data frame is placedNA
In res
, you will have a data frame in which 3 columns randomly are set to NA
per row. The output for the provided data frame is:
# QA QB QC QD QE QF
# 1 -0.6264538 NA NA 1.358680 -0.1645236 NA
# 2 0.1836433 NA 0.78213630 NA -0.2533617 NA
# 3 NA NA 0.07456498 NA 0.6969634 0.3411197
# 4 NA -2.21469989 NA NA 0.5566632 -1.1293631
# 5 NA 1.12493092 0.61982575 NA NA 1.4330237
# 6 -0.8204684 -0.04493361 NA NA NA 1.9803999
# 7 0.4874291 -0.01619026 NA -0.394290 NA NA
# 8 0.7383247 NA -1.47075238 NA NA -1.0441346
# 9 NA 0.82122120 NA 1.100025 NA 0.5697196
# 10 NA 0.59390132 0.41794156 NA NA -0.1350546
data
set.seed(1)
mydf <- data.frame(QA = rnorm(10),
QB = rnorm(10),
QC = rnorm(10),
QD = rnorm(10),
QE = rnorm(10),
QF = rnorm(10))
Upvotes: 1
Reputation: 8107
Should have been:
for (i in seq_len(nrow(mydf))){
miss_vars <- sample(vars, 3)
for (j in miss_vars) {
mydf[i,][[j]] <- NA
}
}
Upvotes: 1