Fhertzp
Fhertzp

Reputation: 11

How can I set the town name based on unique zipcode in R

I have a dataset with two columns, towns and zip codes. I would like it if every ZIP code had one town rather than having several different towns spread out. That is, I was thinking something like:

for every unique zip-code set all town.names equal to the most frequent town name for that zipcode

So far I have tried something like:

unique.zipcode <- mydata$zipcode[!duplicated(mydata$zipcode)

for(i in 1:length(unique.zipcode){
    mydata$town <- names(which.max(table(mydata$town[mydata$zipcode==unique.zipcode[i]])))
}

Upvotes: 1

Views: 49

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78822

Simulate some data since you didn't provide any:

library(dplyr)

set.seed(1492)

data_frame(
  town=sample(LETTERS[1:6], 100, replace=TRUE),
  zip=sample(c("10010", "21000", "30100", "40001"), 100, replace=TRUE)
) -> df

Take a look:

glimpse(df)
## Observations: 100
## Variables: 2
## $ town <chr> "B", "B", "B", "A", "A", "A", "F", "D", "C", "E", "D", "D...
## $ zip  <chr> "21000", "30100", "21000", "21000", "10010", "21000", "40...

Prove we have dups:

count(df, zip)
## # A tibble: 4 × 2
##     zip     n
##   <chr> <int>
## 1 10010    27
## 2 21000    24
## 3 30100    19
## 4 40001    30

Eliminate them by frequency & deal with ties.

count(df, zip, town) %>%
  top_n(1) %>%
  slice(1)
## Source: local data frame [4 x 3]
## Groups: zip [4]
## 
##     zip  town     n
##   <chr> <chr> <int>
## 1 10010     C     7
## 2 21000     D     6
## 3 30100     F     6
## 4 40001     D     8

Upvotes: 1

Related Questions