Reputation: 1801
I'm working on a dataframe (account
) with two columns containing "posting" IP location (in the column city
) and the locations at the time when those accounts were first registered (in the column register
). I'm using grepl()
to subset rows whose posting location and register location are both from the state of New York (NY). Below are part of the data and my code for subsetting the desired output:
account <- data.frame(city = c("Beijing, China", "New York, NY", "Hoboken, NJ", "Los Angeles, CA", "New York, NY", "Bloomington, IN"),
register = c("New York, NY", "New York, NY", "Wilwaukee, WI", "Rochester, NY", "New York, NY", "Tokyo, Japan"))
sub_data <- subset(account, grepl("NY", city) == "NY" & grepl("NY", register) == "NY")
sub_data
[1] city register
<0 rows> (or 0-length row.names)
My code didn't work and returned 0 row (while at least two rows should have met my selection criterion). What went wrong in my code? I have referenced this previous thread before lodging this question.
Upvotes: 1
Views: 462
Reputation:
The function grepl
already returns a logical vector, so just use the following:
sub_data <- subset(account,
grepl("NY", city) & grepl("NY", register)
)
By using something like grepl("NY", city) == "NY"
you are asking R if any values in FALSE TRUE FALSE FALSE TRUE FALSE
are equal to "NY"
, which is of course false.
Upvotes: 1