Bernice
Bernice

Reputation: 79

Concatenate values in a column by group within a loop (R or SQL)

Suppose I've got this data:

 key  id   value            
----- ---- --------
   30  1    A        
   30  2    B               
   40  1    C    
   40  2    D       
   50  1    A  
   50  2    C  
   50  3    D          
   60  1    A
   60  2    B
   60  3    C
   60  4    D

I want a list aggregation by key in R/SQL that would produce this last column:

  key  id   value     Class       
----- ---- --------   --------
   30  1    A         A,B
   30  2    B         B      
   40  1    C         C,D
   40  2    D         D
   50  1    A         A,C,D
   50  2    C         C,D
   50  3    D         D 
   60  1    A         A,B,C,D
   60  2    B         B,C,D
   60  3    C         C,D
   60  4    D         D

The value in Class in the current row always includes current and future values within a group (key).

Any Suggestions?

Many thanks!

Upvotes: 0

Views: 392

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

In R, you can group by key and create a sequence from id to the number of rows in the group and paste the value together.

library(dplyr)

df %>%
  group_by(key) %>%
  mutate(Class = purrr::map_chr(id, ~toString(value[.x:n()])))
  # If id doesn't represent the row number for each key you can use
  #mutate(Class = purrr::map_chr(row_number(), ~toString(value[.x:n()])))

#     key    id value Class     
#   <int> <int> <chr> <chr>     
# 1    30     1 A     A, B      
# 2    30     2 B     B         
# 3    40     1 C     C, D      
# 4    40     2 D     D         
# 5    50     1 A     A, C, D   
# 6    50     2 C     C, D      
# 7    50     3 D     D         
# 8    60     1 A     A, B, C, D
# 9    60     2 B     B, C, D   
#10    60     3 C     C, D      
#11    60     4 D     D         

data

df <- structure(list(key = c(30L, 30L, 40L, 40L, 50L, 50L, 50L, 60L, 
60L, 60L, 60L), id = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 
4L), value = c("A", "B", "C", "D", "A", "C", "D", "A", "B", "C", 
"D")), class = "data.frame", row.names = c(NA, -11L))

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269793

In standard SQL, you can use listagg() as a window function. That would look like:

select t.*,
       listagg(value, ',') within group (order by id) over (partition by key order by id desc) as class
from t;

However, different databases implement listagg() differently, so the particular syntax depends on the database.

And most databases do not permit string aggregation as a cumulative window function. So, you can do what you want using subqueries:

select t.*,
       (select listagg(value, ',') within group (order by id)
        from t t2
        where t2.key = t.key and t2.id >= t.id
       ) as class
from t
order by key, id;

Here is a db<>fiddle, which uses Oracle because that uses listagg().

Upvotes: 1

Related Questions