Reputation: 12560
I have been working with the zipcode package in R.
I am trying to look up some zipcodes I have in a vector. This way, I can get longitude and latitude information that corresponds to my zipcodes (the zipcode package consists of a dataframe that has this information.)
Unfortunately, I'm getting an issue with duplicates. Note that myzips
has a duplicate.
data(zipcode)
myzips <- c(95125,94121,94121,94601, 94025)
matchedzips <- zipcode[ zipcode$zip %in% myzips, ]
I get
zip city state latitude longitude
41275 94025 Menlo Park CA 37.45169 -122.1839
41334 94121 San Francisco CA 37.77873 -122.4926
41564 94601 Oakland CA 37.77683 -122.2179
41756 95125 San Jose CA 37.29509 -121.8965
which is great, but I really need
zip city state latitude longitude
41275 94025 Menlo Park CA 37.45169 -122.1839
41334 94121 San Francisco CA 37.77873 -122.4926
41334 94121 San Francisco CA 37.77873 -122.4926
41564 94601 Oakland CA 37.77683 -122.2179
41756 95125 San Jose CA 37.29509 -121.8965
I'm surprised that this subsetting isn't working. I fared no better with
matchedzips <- zipcode[ which(zipcode$zip %in% myzip) , ].
How do I solve this particular problem, or more importantly, what is the mechanism at work here that is serving to ignore my duplicates? Thanks for any advice in advance.
Upvotes: 0
Views: 44
Reputation: 132874
Use match
instead of %in%
:
DF <- read.table(text=' zip city state latitude longitude
41275 94025 "Menlo Park" CA 37.45169 -122.1839
41334 94121 "San Francisco" CA 37.77873 -122.4926
41564 94601 "Oakland" CA 37.77683 -122.2179
41756 95125 "San Jose" CA 37.29509 -121.8965', header=TRUE)
myzips <- c(95125,94121,94121,94601, 94025)
DF[match(myzips, DF$zip), ]
# zip city state latitude longitude
# 41756 95125 San Jose CA 37.29509 -121.8965
# 41334 94121 San Francisco CA 37.77873 -122.4926
# 41334.1 94121 San Francisco CA 37.77873 -122.4926
# 41564 94601 Oakland CA 37.77683 -122.2179
# 41275 94025 Menlo Park CA 37.45169 -122.1839
zipcode$zip %in% myzips
means "all zipcode$zip which's values are in myzips". Since each value is in zipcode$zip
only once it can be in the result only once.
Upvotes: 3