MYaseen208
MYaseen208

Reputation: 23898

Random samples from each column of a data.frame containing NAs

This is a follow up of this question. I want to draw random sample from each row of a data.frame independently from other rows. The data.frame may contains NAs as given in the given data.frame df.

set.seed(12345)
df1 <- c(rnorm(n=4, mean=0, sd=1), NA)
df2 <- rnorm(n=5, mean=10, sd=1)
df <- rbind(df1, df2)

t(apply(df, 1, sample, replace=TRUE))

         [,1]     [,2]       [,3]     [,4]    [,5]
df1 0.5855288       NA -0.1093033 0.709466      NA
df2 9.7238159 9.723816  8.1820440 9.723816 10.6301

From the first row I want to select four observations (non-empty columns) with replacement and from second row I want to select five observations (non-empty columns) with replacement independently from first selection. But my given code selects five observations with replacement from first row and five observations with replacement from second row.

Upvotes: 2

Views: 548

Answers (1)

akrun
akrun

Reputation: 886938

I guess you want to sample only with the non-NA values. In that case, !is.na can be useful to remove the NA values and then we sample on the remaining values. The output will be a list ('lst') as the number of elements differ (4 and 5) for each row after the sample.

  lst <- apply(df, 1, function(x) sample(x[!is.na(x)], replace=TRUE))

If we need to reconvert the list to matrix, we can append 'NA' at the end to make the lengths same for each of the list elements and we use rbind to convert it back to matrix.

  do.call(rbind,lapply(lst, `length<-`, max(lengths(lst))))

Upvotes: 1

Related Questions