Random samples from each column of a data.frame containing NAs

Question

This is a follow up of this question. I want to draw random sample from each row of a data.frame independently from other rows. The data.frame may contains NAs as given in the given data.frame df.

set.seed(12345)
df1 <- c(rnorm(n=4, mean=0, sd=1), NA)
df2 <- rnorm(n=5, mean=10, sd=1)
df <- rbind(df1, df2)

t(apply(df, 1, sample, replace=TRUE))

         [,1]     [,2]       [,3]     [,4]    [,5]
df1 0.5855288       NA -0.1093033 0.709466      NA
df2 9.7238159 9.723816  8.1820440 9.723816 10.6301

From the first row I want to select four observations (non-empty columns) with replacement and from second row I want to select five observations (non-empty columns) with replacement independently from first selection. But my given code selects five observations with replacement from first row and five observations with replacement from second row.

akrun · Accepted Answer

I guess you want to sample only with the non-NA values. In that case, !is.na can be useful to remove the NA values and then we sample on the remaining values. The output will be a list ('lst') as the number of elements differ (4 and 5) for each row after the sample.

  lst <- apply(df, 1, function(x) sample(x[!is.na(x)], replace=TRUE))

If we need to reconvert the list to matrix, we can append 'NA' at the end to make the lengths same for each of the list elements and we use rbind to convert it back to matrix.

  do.call(rbind,lapply(lst, `length<-`, max(lengths(lst))))

Random samples from each column of a data.frame containing NAs

Answers (1)

Related Questions