Seydou GORO
Seydou GORO

Reputation: 1285

Make a string after grouping

Here is my problem. I've got data on city codes (GeoCode) and zip codes (PostCode). Often several zip codes correspond to a single city code. If that's the case, I want to make a column with a string of zip codes corresponding to the same city:

ID<-1:10
GeoCode<-c("AA","BB","BB","CC","CC","CC","DD","DD","DD","DD")
PostCode<-c("01","10","11","20","21","22","30","31","32","33")
data<-data.frame(ID,GeoCode,PostCode)

I want to make such table. For example "20_21_22" belong to City code CC

   ID GeoCode PostCode strPostcode
1   1      AA       01          01
2   2      BB       10       10_11
3   3      BB       11       10_11
4   4      CC       20    20_21_22
5   5      CC       21    20_21_22
6   6      CC       22    20_21_22
7   7      DD       30 30_31_32_33
8   8      DD       31 30_31_32_33
9   9      DD       32 30_31_32_33
10 10      DD       33 30_31_32_33

Upvotes: 1

Views: 56

Answers (3)

Mike V
Mike V

Reputation: 1354

Or you can try this one

data2 <- data %>% 
  group_by(GeoCode) %>% 
  mutate(strPostCode = paste0(unique(PostCode), collapse = "_"))
# ID  GeoCode PostCode strPostCode
# <int> <chr>   <chr>    <chr>      
# 1     1 AA      01       01         
# 2     2 BB      10       10_11      
# 3     3 BB      11       10_11      
# 4     4 CC      20       20_21_22   
# 5     5 CC      21       20_21_22   
# 6     6 CC      22       20_21_22   
# 7     7 DD      30       30_31_32_33
# 8     8 DD      31       30_31_32_33
# 9     9 DD      32       30_31_32_33
# 10    10 DD     33       30_31_32_33

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101139

The base R option with ave by @akrun is efficient. Here is another workaround

merge(data,
  aggregate(PostCode ~ ., data[-1], paste0, collapse = "_"),
  by = "GeoCode",
  all = TRUE
)

which gives

   GeoCode ID PostCode.x  PostCode.y
1       AA  1         01          01
2       BB  2         10       10_11
3       BB  3         11       10_11
4       CC  4         20    20_21_22
5       CC  5         21    20_21_22
6       CC  6         22    20_21_22
7       DD  7         30 30_31_32_33
8       DD  8         31 30_31_32_33
9       DD  9         32 30_31_32_33
10      DD 10         33 30_31_32_33

Upvotes: 1

akrun
akrun

Reputation: 886968

We could group by 'GeoCode' and paste all the unique 'PostCode' in mutate

library(dplyr)
library(stringr)
data %>% 
    group_by(GeoCode) %>%
    mutate(strPostcode = str_c(unique(PostCode), collapse="_"))
# A tibble: 10 x 4
# Groups:   GeoCode [4]
#      ID GeoCode PostCode strPostcode
#   <int> <chr>   <chr>    <chr>      
# 1     1 AA      01       01         
# 2     2 BB      10       10_11      
# 3     3 BB      11       10_11      
# 4     4 CC      20       20_21_22   
# 5     5 CC      21       20_21_22   
# 6     6 CC      22       20_21_22   
# 7     7 DD      30       30_31_32_33
# 8     8 DD      31       30_31_32_33
# 9     9 DD      32       30_31_32_33
#10    10 DD      33       30_31_32_33

Or an option with base R

data$strPostcode <- with(data, ave(PostCode, GeoCode, FUN = 
        function(x) paste(unique(x), collapse="_")))

Upvotes: 2

Related Questions