Neil
Neil

Reputation: 8247

group by duplicate rows in dplyr

I have following dataframe in R

   Ship_No     Number     
    1244         1
    1244         2
    1244         3
    1244         1
    1245         10
    1245         10
    1245         15
    1245         20
    1245         20

In want to find duplicate Number group by ship_No. My desired dataframe would be

   Ship_no      Number
    1244          1
    1245         10,20

Only numbers which have duplicated with comma seperated.

Upvotes: 0

Views: 6225

Answers (2)

akrun
akrun

Reputation: 886938

We can use data.table

library(data.table)
setDT(x)[duplicated(Number), .(Number = toString(unique(Number))) , Ship_No]
#     Ship_No Number
#1:    1244      1
#2:    1245 10, 20

Upvotes: 2

mt1022
mt1022

Reputation: 17289

here is a solution using filter and summarize:

library(dplyr)
x <- read.table(textConnection('Ship_No     Number     
    1244         1
    1244         2
    1244         3
    1244         1
    1245         10
    1245         10
    1245         15
    1245         20
    1245         20'), header = TRUE)

x %>% group_by(Ship_No) %>%
    filter(duplicated(Number)) %>%
    summarize(Number = paste0(unique(Number), collapse = ','))

# # A tibble: 2 x 2
#   Ship_No Number
#     <int>  <chr>
# 1    1244      1
# 2    1245  10,20

Upvotes: 5

Related Questions