Reputation: 21400
I'd like to subset a dataframe on factor levels but struggle to do so when NA
s are present. Here are two comparative dataframes, one without NA
in the factor column, one with NA
:
df1 <- data.frame(v = c("ABC", "def", "ABC", "ghi"),
f = c(4.11, 3.22, NA, 7.44))
df2 <- data.frame(v = c(NA, "ABC", "def", "ABC", "ghi"),
f = c(2.33, 4.11, 3.22, NA, 7.44))
In df1
, subsetting on factor levels works nicely. For example:
df1[!df1$v == "ABC",]
v f
2 def 3.22
4 ghi 7.44
By contrast, subsetting in df2
is fraught with problems:
df2[!df2$v == "ABC",]
v f
NA <NA> NA
3 def 3.22
5 ghi 7.44
The problems are twofold: (i) the row with <NA>
in df2$v
is included whereas it shouldn't and (ii) the value next to it (i.e. the value on the same row under df2$f
) is NA
whereas that value should be 2.33
.
How can I subset df2
cleanly and correctly, so that the outcome is this:
v f
3 def 3.22
5 ghi 7.44
Upvotes: 1
Views: 48
Reputation: 9247
You can use the following line of code
df2[!(df2$v == "ABC") & !is.na(df2$v), ]
# v f
# 3 def 3.22
# 5 ghi 7.44
or also this line, which I prefer as I don't have to type a couple of extra parentheses
df2[df2$v != "ABC" & !is.na(df2$v), ]
# v f
# 3 def 3.22
# 5 ghi 7.44
Upvotes: 1