sifar
sifar

Reputation: 1148

Combine duplicate rows in column as comma separated values - Google Query

If i have 2 columns viz., ID & Name, ID column containing duplicates, and if i want to group by ID to get unique ID's but name column should be a comma-separated list, can this be possible in Google Query?

| ID   | Name |
===============
| 1001 | abc  |
---------------
| 1001 | def  |
---------------
| 1002 | kjg  |
---------------
| 1003 | aof  |
---------------
| 1003 | lmi  |
---------------
| 1004 | xyz  |
---------------

into

| ID   | Name      |
====================
| 1001 | abc, def  |
--------------------
| 1002 | kjg       |
--------------------
| 1003 | aof, lmi  |
--------------------
| 1004 | xyz       |
--------------------

Upvotes: 2

Views: 2366

Answers (4)

player0
player0

Reputation: 1

if you could live with the end-comma present in the output you can try:

=ARRAYFORMULA({QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1 
  pivot Col2"),
 "select Col1 offset 1", 0), 
 TRANSPOSE(QUERY(TRANSPOSE(IFERROR(VLOOKUP(QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1 
  pivot Col2"),
 "select Col1 offset 1", 0), 
 QUERY(QUERY({A2:B, B2:B&","}, 
 "select Col1,max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1 
  pivot Col2"),
 "offset 1", 0), 
 SPLIT(TRANSPOSE(QUERY(TRANSPOSE(IF(QUERY(QUERY({A2:B, B2:B&","}, 
 "select max(Col3) 
  where Col1 is not null 
    and Col3 <> ',' 
  group by Col1    
  pivot Col2"),
 "offset 1", 0)="",,COLUMN(B2:XXX)&",")),,999^99)), ","), 0))),,999^99))})

enter image description here

(tho this was never tested on an ultra-massive dataset but in theory, it should handle anything too)

Upvotes: 1

player0
player0

Reputation: 1

try:

=ARRAYFORMULA({QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col2) 
  where Col1 is not null 
  group by Col1 
  pivot Col3"), 
 "select Col1 
  offset 1", 0), REGEXREPLACE(TRIM(
 TRANSPOSE(QUERY(TRANSPOSE(QUERY(QUERY({A2:B&",", B2:B}, 
 "select max(Col2) 
  where Col1 is not null 
    and Col2 <> ',' 
  group by Col1 
  pivot Col3"), 
 "offset 1", 0)),,999^9))), ",$", )})

enter image description here

however, this may not work for massive datasets due to TRIM (which is needed to remove empty spaces) and REGEXREPLACE (which is needed to remove the end comma) limitations. otherwise, without it, the formula can handle anything:

=ARRAYFORMULA({QUERY(QUERY({A2:B, B2:B}, 
 "select Col1,max(Col2) 
  where Col1 is not null 
  group by Col1 
  pivot Col3"), 
 "select Col1 
  offset 1", 0), 
 TRANSPOSE(QUERY(TRANSPOSE(QUERY(QUERY({A2:B&",", B2:B}, 
 "select max(Col2) 
  where Col1 is not null 
    and Col2 <> ',' 
  group by Col1 
  pivot Col3"), 
 "offset 1", 0)),,999^9))})

Upvotes: 3

kirkg13
kirkg13

Reputation: 3010

Here is an answer using QUERY.

=ARRAYFORMULA(REGEXREPLACE(TRIM(SPLIT(TRANSPOSE(SPLIT(
 CONCATENATE(TRANSPOSE(QUERY({"♦"&A2:A&"♠", B2:B&", "}, 
 "select max(Col2) where Col2 is not null group by Col2 pivot Col1", 0))), 
 "♦")), "♠")), ",$", ))

This comes directly from this question. Player0 has answers with just amazing formulas that are able to reorganise data in a huge variety of ways.

enter image description here

Upvotes: 1

Krzysztof Dołęgowski
Krzysztof Dołęgowski

Reputation: 2660

I looked through Query specification. I could not find a solution. So I made some formulas that do the job (because I found this task interesting). enter image description here

D2 contains =unique(a2:a)

E2 contains =join(", ",transpose(filter($B$2:$B,$A$2:$A=D2)))and it's copied down.

I had to copy formulas down (far from beautiful formula) Hope you find it helpful.

Reference

Upvotes: 2

Related Questions