Reputation: 79
Suppose I've got this data:
key id value
----- ---- --------
30 1 A
30 2 B
40 1 C
40 2 D
50 1 A
50 2 C
50 3 D
60 1 A
60 2 B
60 3 C
60 4 D
I want a list aggregation by key in R/SQL that would produce this last column:
key id value Class
----- ---- -------- --------
30 1 A A,B
30 2 B B
40 1 C C,D
40 2 D D
50 1 A A,C,D
50 2 C C,D
50 3 D D
60 1 A A,B,C,D
60 2 B B,C,D
60 3 C C,D
60 4 D D
The value in Class in the current row always includes current and future values within a group (key).
Any Suggestions?
Many thanks!
Upvotes: 0
Views: 392
Reputation: 388982
In R, you can group by key
and create a sequence from id
to the number of rows in the group and paste the value
together.
library(dplyr)
df %>%
group_by(key) %>%
mutate(Class = purrr::map_chr(id, ~toString(value[.x:n()])))
# If id doesn't represent the row number for each key you can use
#mutate(Class = purrr::map_chr(row_number(), ~toString(value[.x:n()])))
# key id value Class
# <int> <int> <chr> <chr>
# 1 30 1 A A, B
# 2 30 2 B B
# 3 40 1 C C, D
# 4 40 2 D D
# 5 50 1 A A, C, D
# 6 50 2 C C, D
# 7 50 3 D D
# 8 60 1 A A, B, C, D
# 9 60 2 B B, C, D
#10 60 3 C C, D
#11 60 4 D D
data
df <- structure(list(key = c(30L, 30L, 40L, 40L, 50L, 50L, 50L, 60L,
60L, 60L, 60L), id = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L,
4L), value = c("A", "B", "C", "D", "A", "C", "D", "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -11L))
Upvotes: 1
Reputation: 1269793
In standard SQL, you can use listagg()
as a window function. That would look like:
select t.*,
listagg(value, ',') within group (order by id) over (partition by key order by id desc) as class
from t;
However, different databases implement listagg()
differently, so the particular syntax depends on the database.
And most databases do not permit string aggregation as a cumulative window function. So, you can do what you want using subqueries:
select t.*,
(select listagg(value, ',') within group (order by id)
from t t2
where t2.key = t.key and t2.id >= t.id
) as class
from t
order by key, id;
Here is a db<>fiddle, which uses Oracle because that uses listagg()
.
Upvotes: 1