Reputation: 181
I have a > 30,000 rows tibble with ID colums, dates and acts. Sometimes, for an ID, two acts are registered the very same day. In that case would like to aggregate those acts.
Please find below a minimal working example (MWE):
> id <- c ("N02", "N02", "N02", "N02", "N03", "N03", "N05", "N05", "N05", "N05")
> date_1 <- c ("2008-03-15", "2008-03-15", "2008-04-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-07", "2014-03-13")
> act <- c ("YYYY050", "ZZNL040", "YYYY050", "ZZNL040", "ZZNL040", "ZZNL040", "ZZNL065", "ZZNL065", "ZZNL040", "ZZNL065")
> date_2 <- c ("2008-03-15", "2008-03-15", "2008-04-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-07", "2014-03-13")
> df1 <- data.frame (id, date_1, act, date_2)
> df1
id date_1 act date_2
1 N02 2008-03-15 YYYY050 2008-03-15
2 N02 2008-03-15 ZZNL040 2008-03-15
3 N02 2008-04-15 YYYY050 2008-04-15
4 N02 2008-04-15 ZZNL040 2008-04-15
5 N03 2008-06-15 ZZNL040 2008-06-15
6 N03 2008-07-15 ZZNL040 2008-07-15
7 N05 2014-03-06 ZZNL065 2014-03-06
8 N05 2014-03-07 ZZNL065 2014-03-07
9 N05 2014-03-07 ZZNL040 2014-03-07
10 N05 2014-03-13 ZZNL065 2014-03-13
In present MWE, ID
N02
has act
YYYY050
and ZZNL040
the same day 2008-03-15
.
In that case of mutiple acts the very same day, I would like aggregate the act values. I need to preserve the tibble structure id est date_1
and date_2
which are sometimes different for further calculations and, therefore use dplyr.
My desired output would be:
> id <- c ("N02", "N02", "N03", "N03", "N05", "N05", "N05")
> date_1 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-13")
> act <- c ("YYYY050/ZZNL040", "YYYY050/ZZNL040","ZZNL040", "ZZNL040", "ZZNL065", "ZZNL065/ZZNL040", "ZZNL065")
> date_2 <- c ("2008-03-15", "2008-04-15", "2008-06-15", "2008-07-15", "2014-03-06", "2014-03-07", "2014-03-13")
> df2 <- data.frame (id, date_1, act, date_2)
> df2
id date_1 act date_2
1 N02 2008-03-15 YYYY050/ZZNL040 2008-03-15
2 N02 2008-04-15 YYYY050/ZZNL040 2008-04-15
3 N03 2008-06-15 ZZNL040 2008-06-15
4 N03 2008-07-15 ZZNL040 2008-07-15
5 N05 2014-03-06 ZZNL065 2014-03-06
6 N05 2014-03-07 ZZNL065/ZZNL040 2014-03-07
7 N05 2014-03-13 ZZNL065 2014-03-13
Any idea ? Thank you in advance for your help. Charles.
Upvotes: 1
Views: 32
Reputation: 173793
You could do:
library(dplyr)
df1 %>%
group_by(id, date_1, date_2) %>%
summarise(act = paste(act, collapse = '/')) %>%
select(1, 2, 4, 3)
#> # A tibble: 7 x 4
#> # Groups: id, date_1 [7]
#> id date_1 act date_2
#> <chr> <chr> <chr> <chr>
#> 1 N02 2008-03-15 YYYY050/ZZNL040 2008-03-15
#> 2 N02 2008-04-15 YYYY050/ZZNL040 2008-04-15
#> 3 N03 2008-06-15 ZZNL040 2008-06-15
#> 4 N03 2008-07-15 ZZNL040 2008-07-15
#> 5 N05 2014-03-06 ZZNL065 2014-03-06
#> 6 N05 2014-03-07 ZZNL065/ZZNL040 2014-03-07
#> 7 N05 2014-03-13 ZZNL065 2014-03-13
Upvotes: 1