How to subset dataframe on factor levels when NA are present

Question

I'd like to subset a dataframe on factor levels but struggle to do so when NAs are present. Here are two comparative dataframes, one without NA in the factor column, one with NA:

df1 <- data.frame(v = c("ABC", "def", "ABC", "ghi"), 
                  f = c(4.11, 3.22, NA, 7.44))

df2 <- data.frame(v = c(NA, "ABC", "def", "ABC", "ghi"), 
                  f = c(2.33, 4.11, 3.22, NA, 7.44))

In df1, subsetting on factor levels works nicely. For example:

df1[!df1$v == "ABC",]
    v    f
2 def 3.22
4 ghi 7.44

By contrast, subsetting in df2 is fraught with problems:

df2[!df2$v == "ABC",]
      v    f
NA    NA
3   def 3.22
5   ghi 7.44

The problems are twofold: (i) the row with in df2$v is included whereas it shouldn't and (ii) the value next to it (i.e. the value on the same row under df2$f) is NA whereas that value should be 2.33.

How can I subset df2 cleanly and correctly, so that the outcome is this:

      v    f
3   def 3.22
5   ghi 7.44

Ric S · Accepted Answer

You can use the following line of code

df2[!(df2$v == "ABC") & !is.na(df2$v), ]

#     v    f
# 3 def 3.22
# 5 ghi 7.44

or also this line, which I prefer as I don't have to type a couple of extra parentheses

df2[df2$v != "ABC" & !is.na(df2$v), ]

#     v    f
# 3 def 3.22
# 5 ghi 7.44

How to subset dataframe on factor levels when NA are present

Answers (1)

Related Questions