Reputation: 1091
I have a dataframe ma
it has a factor called type
type
is comprised of the following factors: I210, I210plus, I210plusc, KV2c, KV2cplus
I'd like to put some of these factors in a vector, say, selected_types
so, selected_types<-c("I210plusc","KV2c")
then, have this command subset the dataframe ma
ma1<-subset(ma, type==selected_types)
such that ma1
would be a subset of ma
consisting of only the observations that had
type I210plusc and KV2c
however, when I do this, the number of observations in the resulting dataframe ma1
is less than the sum of the occurrences of the two types in selected_types
from the original ma
Any ideas on what I'm doing incorrectly?
Thank you
Upvotes: 0
Views: 115
Reputation: 52637
I originally had this in a comment, but it's a bit lengthy, plus I wanted to add to it. Here some details on what's happening:
what you're doing with ==
is recycling your two length vector, so that every even row is compared to "KV2c"
, and every odd one to "I210plusc"
, so your final result will be the data frame of odd rows that are "KV2c"
and even rows that are "I210plusc"
.
An alternate solution that might make the issue clear is as follows:
subset(ma, type == selected_types[[1]] | type == selected_types[[2]])
Or, more gracefully:
subset(ma, type %in% selected_types)
The %in%
operator returns a logical vector of same length as type
with TRUE
for every position in type
that "is in" selected_types
(hence the name of the operator).
Upvotes: 4