subset indexing in r

Question

I have a dataframe ma

it has a factor called type

type is comprised of the following factors: I210, I210plus, I210plusc, KV2c, KV2cplus

I'd like to put some of these factors in a vector, say, selected_types

so, selected_types<-c("I210plusc","KV2c")

then, have this command subset the dataframe ma

ma1<-subset(ma, type==selected_types)

such that ma1 would be a subset of ma consisting of only the observations that had

type I210plusc and KV2c

however, when I do this, the number of observations in the resulting dataframe ma1 is less than the sum of the occurrences of the two types in selected_types from the original ma

Any ideas on what I'm doing incorrectly?

Thank you

BrodieG · Accepted Answer

I originally had this in a comment, but it's a bit lengthy, plus I wanted to add to it. Here some details on what's happening:

what you're doing with == is recycling your two length vector, so that every even row is compared to "KV2c", and every odd one to "I210plusc", so your final result will be the data frame of odd rows that are "KV2c" and even rows that are "I210plusc".

An alternate solution that might make the issue clear is as follows:

subset(ma, type == selected_types[[1]] | type == selected_types[[2]])

Or, more gracefully:

subset(ma, type %in% selected_types)

The %in% operator returns a logical vector of same length as type with TRUE for every position in type that "is in" selected_types (hence the name of the operator).

subset indexing in r

Answers (1)

Related Questions