group by duplicate rows in dplyr

Question

I have following dataframe in R

   Ship_No     Number     
    1244         1
    1244         2
    1244         3
    1244         1
    1245         10
    1245         10
    1245         15
    1245         20
    1245         20

In want to find duplicate Number group by ship_No. My desired dataframe would be

   Ship_no      Number
    1244          1
    1245         10,20

Only numbers which have duplicated with comma seperated.

mt1022 · Accepted Answer

here is a solution using filter and summarize:

library(dplyr)
x <- read.table(textConnection('Ship_No     Number     
    1244         1
    1244         2
    1244         3
    1244         1
    1245         10
    1245         10
    1245         15
    1245         20
    1245         20'), header = TRUE)

x %>% group_by(Ship_No) %>%
    filter(duplicated(Number)) %>%
    summarize(Number = paste0(unique(Number), collapse = ','))

# # A tibble: 2 x 2
#   Ship_No Number
#       
# 1    1244      1
# 2    1245  10,20

group by duplicate rows in dplyr

Answers (2)

Related Questions