CharlesLDN
CharlesLDN

Reputation: 181

In R with dplyr, how to aggregate one colum values if repeated dates are duplicated, by group

I have a > 30,000 rows tibble with ID colums, dates and acts. Sometimes, for an ID, two acts are registered the very same day. In that case would like to aggregate those acts.

Please find below a minimal working example (MWE):

> id <- c ("N02", "N02", "N02", "N02", "N03", "N03", "N05", "N05", "N05", "N05")
> date_1 <- c ("2008-03-15", "2008-03-15", "2008-04-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-07", "2014-03-13")
> act <- c ("YYYY050", "ZZNL040", "YYYY050", "ZZNL040", "ZZNL040", "ZZNL040", "ZZNL065", "ZZNL065", "ZZNL040", "ZZNL065")
> date_2 <- c ("2008-03-15", "2008-03-15", "2008-04-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-07", "2014-03-13")
> df1 <- data.frame (id, date_1, act, date_2)
> df1
    id     date_1     act     date_2
1  N02 2008-03-15 YYYY050 2008-03-15
2  N02 2008-03-15 ZZNL040 2008-03-15
3  N02 2008-04-15 YYYY050 2008-04-15
4  N02 2008-04-15 ZZNL040 2008-04-15
5  N03 2008-06-15 ZZNL040 2008-06-15
6  N03 2008-07-15 ZZNL040 2008-07-15
7  N05 2014-03-06 ZZNL065 2014-03-06
8  N05 2014-03-07 ZZNL065 2014-03-07
9  N05 2014-03-07 ZZNL040 2014-03-07
10 N05 2014-03-13 ZZNL065 2014-03-13

In present MWE, ID N02 has act YYYY050 and ZZNL040 the same day 2008-03-15.

In that case of mutiple acts the very same day, I would like aggregate the act values. I need to preserve the tibble structure id est date_1 and date_2 which are sometimes different for further calculations and, therefore use dplyr.

My desired output would be:

> id <- c ("N02", "N02", "N03", "N03", "N05", "N05", "N05")
> date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-13")
> act <- c ("YYYY050/ZZNL040", "YYYY050/ZZNL040","ZZNL040", "ZZNL040", "ZZNL065", "ZZNL065/ZZNL040", "ZZNL065")
> date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-13")
> df2 <- data.frame (id, date_1, act, date_2)
> df2
   id     date_1             act     date_2
1 N02 2008-03-15 YYYY050/ZZNL040 2008-03-15
2 N02 2008-04-15 YYYY050/ZZNL040 2008-04-15
3 N03 2008-06-15         ZZNL040 2008-06-15
4 N03 2008-07-15         ZZNL040 2008-07-15
5 N05 2014-03-06         ZZNL065 2014-03-06
6 N05 2014-03-07 ZZNL065/ZZNL040 2014-03-07
7 N05 2014-03-13         ZZNL065 2014-03-13

Any idea ? Thank you in advance for your help. Charles.

Upvotes: 1

Views: 32

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173793

You could do:

library(dplyr)

df1 %>%
  group_by(id, date_1, date_2) %>%
  summarise(act = paste(act, collapse = '/')) %>%
  select(1, 2, 4, 3)
#> # A tibble: 7 x 4
#> # Groups:   id, date_1 [7]
#>   id    date_1     act             date_2    
#>   <chr> <chr>      <chr>           <chr>     
#> 1 N02   2008-03-15 YYYY050/ZZNL040 2008-03-15
#> 2 N02   2008-04-15 YYYY050/ZZNL040 2008-04-15
#> 3 N03   2008-06-15 ZZNL040         2008-06-15
#> 4 N03   2008-07-15 ZZNL040         2008-07-15
#> 5 N05   2014-03-06 ZZNL065         2014-03-06
#> 6 N05   2014-03-07 ZZNL065/ZZNL040 2014-03-07
#> 7 N05   2014-03-13 ZZNL065         2014-03-13

Upvotes: 1

Related Questions