antecessor
antecessor

Reputation: 2800

Removing NA cases and calculating mean of a factor in R

How can I remove NA cases in a column and calculate the mean value of a factor at the same time?

With this code I calculate the mean value of DC1 in Group_A, within x dataframe

test.mean <- mean(x$DC1[x$Groups=="Group_A"])

However, some values of the DC1 column in the Group_A factor do have NA cells. In order to remove NA cases from DC1, I run this code, where the column DC1 is the 3rd.

test.filterNA <- x[complete.cases(x[ , 3]), ]

How can I merge both codes in one simple line?

Upvotes: 0

Views: 7768

Answers (2)

akrun
akrun

Reputation: 887128

There are couple of options to deal with this situation. Here, the column 'Groups' is having some missing values. With the == operator, NA values are returned as NA

c(1:3, NA) == 2
#[1] FALSE  TRUE FALSE    NA

When we subset another column based on the logical index above, the NA values will return as NA

If the function to be applied have a missing value removal option, it can be used. In the case of mean, there is na.rm which is by default FALSE. Change it to TRUE and it should work

mean(x$DC1[x$Groups == "Group_A"], na.rm = TRUE)

Another option is to make the NA value to return as FALSE. This can be done by having another logical condition & !is.na

mean(x$DC1[x$Groups=="Group_A" & !is.na(x$Groups)])

If there are no NA values in 'DC1', it should work fine. To be safe, it may be better to have na.rm = TRUE added as well


Third option is using %in% which always return TRUE/FALSE

mean(x$DC1[x$Groups %in% "Group_A"])

Upvotes: 2

divibisan
divibisan

Reputation: 12155

Two options from @akrun:

mean(x$DC1[x$Groups == "Group_A"], na.rm = TRUE)

or

mean(x$DC1[x$Groups=="Group_A" & !is.na(x$Groups)])

Upvotes: 2

Related Questions