Reputation: 2259
I have some rank data with missing values. The highest ranked item was assigned a value of '1'. 'NA' values occur when the item was not ranked.
# sample data
df <- data.frame(Item1 = c(1,2, NA, 2, 3), Item2 = c(3,1,NA, NA, 1), Item3 = c(2,NA, 1, 1, 2))
> df
Item1 Item2 Item3
1 1 3 2
2 2 1 NA
3 NA NA 1
4 2 NA 1
5 3 1 2
I would like to randomly impute the 'NA' values in each row with the appropriate unranked values. One solution that would meet my goal would be this:
> solution1
Item1 Item2 Item3
1 1 3 2
2 2 1 3
3 3 2 1
4 2 3 1
5 3 1 2
This code gives a list of possible replacement values for each row.
# set max possible rank in data
max_val <- 3
# calculate row max
df$row_max <- apply(df, 1, max, na.rm= T)
# calculate number of missing values in each row
df$num_na <- max_val - df$row_max
# set a sample vector
samp_vec <- 1:max_val # set a sample vector
# set an empty list
replacements <- vector(mode = "list", length = nrow(df))
# generate a list of replacements for each row
for(i in 1:nrow(df)){
if(df$num_na[i] > 0){
replacements[[i]] <- sample(samp_vec[samp_vec > df$row_max[i] ], df$num_na[i])
} else {
replacements[[i]] <- NULL
}
}
Now puzzling over how I can assign the values in my list to the missing values in each row of my data.frame. (My actual data has 1000's of rows.)
Is there a clean way to do this?
Upvotes: 0
Views: 216
Reputation: 388982
A base R option using apply
-
set.seed(123)
df[] <- t(apply(df, 1, function(x) {
#Get values which are not present in the row
val <- setdiff(seq_along(x), x)
#If only 1 missing value replace with the one which is not missing
if(length(val) == 1) x[is.na(x)] <- val
#If more than 1 missing replace randomly
else if(length(val) > 1) x[is.na(x)] <- sample(val)
#If no missing replace the row as it is
x
}))
df
# Item1 Item2 Item3
#1 1 3 2
#2 2 1 3
#3 2 3 1
#4 2 3 1
#5 3 1 2
Upvotes: 1