Reputation: 580
I'm subseting my data, and I'm getting different results for the following codes:
subset(df, x==1)
df[df$x==1,]
x
's type is integer
Am I doing something wrong? Thank you in advance
Upvotes: 0
Views: 83
Reputation: 11514
Without example data, it is difficult to say what your problem is. However, my hunch is that the following probably explains your problem:
df <- data.frame(quantity=c(1:3, NA), item=c("Coffee", "Americano", "Espresso", "Decaf"))
df
quantity item
1 Coffee
2 Americano
3 Espresso
NA Decaf
Let's subset with [
df[df$quantity == 2,]
quantity item
2 Americano
NA <NA>
Now let's subset with subset
:
subset(df, quantity == 2)
quantity item
2 Americano
We see that there is a difference in sub-setting output depending on how NA
values are treated. I think of this as follows: With subset
, you are explicitly stating you want the subset for which the condition is verifiably true. df$quantity==2
produces a vector of true/false-statements, but where quantity is missing, it is impossible to assign TRUE
or FALSE
. This is why we get the following output with an NA at the end:
df$quantity==2
[1] FALSE TRUE FALSE NA
The function [
takes this vector but does not understand what to do with NA
, which is why instead of NA Decaf
we get NA <NA>
. If you prefer using [
, you could use the following instead:
df[which(df$quantity == 2),]
quantity item
2 Americano
This translates the logical condition df$quantity == 2
into a vector or row numbers where the logical condition is "verifiably" satisfied.
Upvotes: 5