Reputation: 2800
How can I remove NA cases in a column and calculate the mean value of a factor at the same time?
With this code I calculate the mean value of DC1 in Group_A, within x dataframe
test.mean <- mean(x$DC1[x$Groups=="Group_A"])
However, some values of the DC1 column in the Group_A factor do have NA cells. In order to remove NA cases from DC1, I run this code, where the column DC1 is the 3rd.
test.filterNA <- x[complete.cases(x[ , 3]), ]
How can I merge both codes in one simple line?
Upvotes: 0
Views: 7768
Reputation: 887128
There are couple of options to deal with this situation. Here, the column 'Groups' is having some missing values. With the ==
operator, NA values are returned as NA
c(1:3, NA) == 2
#[1] FALSE TRUE FALSE NA
When we subset another column based on the logical index above, the NA values will return as NA
If the function to be applied have a missing value removal option, it can be used. In the case of mean
, there is na.rm
which is by default FALSE. Change it to TRUE and it should work
mean(x$DC1[x$Groups == "Group_A"], na.rm = TRUE)
Another option is to make the NA value to return as FALSE. This can be done by having another logical condition & !is.na
mean(x$DC1[x$Groups=="Group_A" & !is.na(x$Groups)])
If there are no NA values in 'DC1', it should work fine. To be safe, it may be better to have na.rm = TRUE
added as well
Third option is using %in%
which always return TRUE/FALSE
mean(x$DC1[x$Groups %in% "Group_A"])
Upvotes: 2
Reputation: 12155
Two options from @akrun:
mean(x$DC1[x$Groups == "Group_A"], na.rm = TRUE)
or
mean(x$DC1[x$Groups=="Group_A" & !is.na(x$Groups)])
Upvotes: 2