David Ranzolin
David Ranzolin

Reputation: 1074

Using string_agg to create distinct groups by row

Consider 'Table 1' below:

id x
1 A
1 B
1 C
2 B
2 C
3 A

I want to query Table 1 to produce the following:

id x grps
1 A B, C
1 B A, C
1 C A, B
2 B C
2 C B
3 A

Using string_agg and grouping by id produces "A, B, C" for all id 1 rows, and I want to exclude the x value from grps for that row (where x = "A", grps should equal "B, C", etc.) Is there some self-join/string_agg/grouping trick?

Edit: My current query is below:

library(duckdb)
#> Loading required package: DBI
df <- tibble::tibble(
  id = c(1, 1, 1, 2, 2, 3),
  x = c("A", "B", "C", "B", "C", "A")
)

con <- dbConnect(duckdb())
duckdb_register(con, "tab1", df)
q <- "
WITH tab_grps AS (
  SELECT id,
         string_agg(x, ',') AS grps
  FROM tab1
  GROUP BY id
)
SELECT tab1.id,
       tab1.x,
       tab_grps.grps
FROM tab1
LEFT JOIN tab_grps ON tab1.id = tab_grps.id 
"
dbGetQuery(con, q)
#>   id x  grps
#> 1  1 A A,B,C
#> 2  1 B A,B,C
#> 3  1 C A,B,C
#> 4  2 B   B,C
#> 5  2 C   B,C
#> 6  3 A     A
dbDisconnect(con)

Created on 2024-05-22 with reprex v2.0.2

Upvotes: 1

Views: 89

Answers (1)

jqurious
jqurious

Reputation: 21144

It sounds like you may want a Window Function with an EXCLUDE CURRENT ROW clause.

duckdb.sql("""
from (values 
   (1, 'A'), (1, 'B'), (1, 'C'), 
   (2, 'B'), (2, 'C'), (3, 'A')
) t(id, x)

select
   *,
   string_agg(x) over(
      partition by id
      order by id
      rows between unbounded preceding and unbounded following  
      exclude current row
   ) as grps
""")
┌───────┬─────────┬─────────┐
│  id   │    x    │  grps   │
│ int32 │ varchar │ varchar │
├───────┼─────────┼─────────┤
│     1 │ A       │ B,C     │
│     1 │ B       │ A,C     │
│     1 │ C       │ A,B     │
│     2 │ B       │ C       │
│     2 │ C       │ B       │
│     3 │ A       │ NULL    │
└───────┴─────────┴─────────┘

Upvotes: 4

Related Questions