staove7
staove7

Reputation: 580

Different results for 2 subset data methods in R

I'm subseting my data, and I'm getting different results for the following codes:

subset(df, x==1)
df[df$x==1,]

x's type is integer

Am I doing something wrong? Thank you in advance

Upvotes: 0

Views: 83

Answers (1)

coffeinjunky
coffeinjunky

Reputation: 11514

Without example data, it is difficult to say what your problem is. However, my hunch is that the following probably explains your problem:

df <- data.frame(quantity=c(1:3, NA), item=c("Coffee", "Americano", "Espresso", "Decaf"))
df
quantity      item
       1    Coffee
       2 Americano
       3  Espresso
      NA     Decaf

Let's subset with [

df[df$quantity == 2,]
 quantity      item
        2 Americano
       NA      <NA>

Now let's subset with subset:

subset(df, quantity == 2)
quantity      item
       2 Americano

We see that there is a difference in sub-setting output depending on how NA values are treated. I think of this as follows: With subset, you are explicitly stating you want the subset for which the condition is verifiably true. df$quantity==2 produces a vector of true/false-statements, but where quantity is missing, it is impossible to assign TRUE or FALSE. This is why we get the following output with an NA at the end:

df$quantity==2
[1] FALSE  TRUE FALSE    NA

The function [ takes this vector but does not understand what to do with NA, which is why instead of NA Decaf we get NA <NA>. If you prefer using [, you could use the following instead:

df[which(df$quantity == 2),]
quantity      item
       2 Americano

This translates the logical condition df$quantity == 2 into a vector or row numbers where the logical condition is "verifiably" satisfied.

Upvotes: 5

Related Questions