tumultous_rooster
tumultous_rooster

Reputation: 12560

Wanting to preserve duplicates when matching in R, while working with indices

I have been working with the zipcode package in R.

I am trying to look up some zipcodes I have in a vector. This way, I can get longitude and latitude information that corresponds to my zipcodes (the zipcode package consists of a dataframe that has this information.)

Unfortunately, I'm getting an issue with duplicates. Note that myzips has a duplicate.

data(zipcode)
myzips <- c(95125,94121,94121,94601, 94025)  
matchedzips <- zipcode[ zipcode$zip %in% myzips, ]   

I get

        zip          city state latitude longitude
41275 94025    Menlo Park    CA 37.45169 -122.1839
41334 94121 San Francisco    CA 37.77873 -122.4926
41564 94601       Oakland    CA 37.77683 -122.2179
41756 95125      San Jose    CA 37.29509 -121.8965

which is great, but I really need

        zip          city state latitude longitude
41275 94025    Menlo Park    CA 37.45169 -122.1839
41334 94121 San Francisco    CA 37.77873 -122.4926
41334 94121 San Francisco    CA 37.77873 -122.4926
41564 94601       Oakland    CA 37.77683 -122.2179
41756 95125      San Jose    CA 37.29509 -121.8965

I'm surprised that this subsetting isn't working. I fared no better with

matchedzips <- zipcode[ which(zipcode$zip %in% myzip)  , ].

How do I solve this particular problem, or more importantly, what is the mechanism at work here that is serving to ignore my duplicates? Thanks for any advice in advance.

Upvotes: 0

Views: 44

Answers (1)

Roland
Roland

Reputation: 132874

Use match instead of %in%:

DF <- read.table(text='  zip          city state latitude longitude
41275 94025    "Menlo Park"    CA 37.45169 -122.1839
41334 94121 "San Francisco"    CA 37.77873 -122.4926
41564 94601       "Oakland"    CA 37.77683 -122.2179
41756 95125      "San Jose"    CA 37.29509 -121.8965', header=TRUE)

myzips <- c(95125,94121,94121,94601, 94025)  
DF[match(myzips, DF$zip), ]

#           zip          city state latitude longitude
# 41756   95125      San Jose    CA 37.29509 -121.8965
# 41334   94121 San Francisco    CA 37.77873 -122.4926
# 41334.1 94121 San Francisco    CA 37.77873 -122.4926
# 41564   94601       Oakland    CA 37.77683 -122.2179
# 41275   94025    Menlo Park    CA 37.45169 -122.1839

zipcode$zip %in% myzips means "all zipcode$zip which's values are in myzips". Since each value is in zipcode$zip only once it can be in the result only once.

Upvotes: 3

Related Questions