Reputation: 39
I have a data frame such as
a = c(2,NA,3,4)
b = c(NA,3,NA,NA)
c= c(5,NA,7,9)
test = data.frame(a,b,c)
> test
a b c
1 2 NA 5
2 NA 3 NA
3 3 NA 7
4 4 NA 9
I would like to fill in only NA values in test$b with the average of test$a and test$c for that row. The result should be
a b c
1 2 3.5 5
2 NA 3 NA
3 3 5 7
4 4 6.5 9
I have tried the apply family but haven't gotten anywhere. Would like to avoid a for loop because I am told I should try to avoid for loops.
In English I want to say,
if test$b[i] == NA, test$b[i] = (test$a[i] + test$b[i])/2
else leave test$b[i] as it is.
I'm sure this kind of question has been answered many times but I can't find (or recognise) something analogous. Thanks in advance.
Upvotes: 1
Views: 500
Reputation: 887851
You can create a logical row index ('indx') for the elements that are 'NA' in the 'b' column. Use that to replace the NA values in 'b' by taking the `rowMeans of the columns other than 'b'. (Modified based on comments from @thelatemail)
indx <- is.na(test$b)
test$b[indx] <- rowMeans(test[indx,], na.rm=TRUE)
test
# a b c
#1 2 3.5 5
#2 NA 3.0 NA
#3 3 5.0 7
#4 4 6.5 9
Upvotes: 5