MREES
MREES

Reputation: 25

combine distinct row values into a string - sql

I would like to take cells in every row and make them into a string of names... My method already deals with casing.

For example, the table;

'john' |        | 'smith' | 'smith'    
'john' | 'paul' |         | 'smith'
'john' | 'john' | 'john'  |    

returns:

'john smith'
'john paul smith'
'john'

This would need to run postgreSQL 8.2.15 of postgres so I can't make use of potentially useful functions like CONCAT, and data is in a greenplum db.

Alternatively, a method to directly delete duplicate tokens in a list of strings would let me achieve the larger objective. For example:

'john smith john smith'
'john john smith'
'smith john smith'

returns

'john smith'
'john smith'
'smith john'

The order of the tokens is not important, as long as all the unique values are returned, once only.

Thanks

Upvotes: 0

Views: 3559

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269443

I would do this by unpivoting the data and then reaggregation:

select id, string_agg(distinct col)
from (select id, col1 from t union all
      select id, col2 from t union all
      select id, col3 from t union all
      select id, col4 from t
     ) t
where col is not null
group by id;

This assumes that each row has an unique id.

You can also use a giant case:

select concat_ws(',',
                 col1,
                 (case when col2 <> col1 then col2 end),
                 (case when col3 <> col2 and col3 <> col1 then col3 end),
                 (case when col4 <> col3 and col4 <> col2 and col4 <> col1 then col4 end)
                ) as newcol
from t;

In ancient versions of Postgres, you can phrase this as:

select trim(leading ',' from
            (coalesce(',' || col1, '') ||
             (case when col2 <> col1 then ',' || col2 else '' end) ||
             (case when col3 <> col2 and col3 <> col1 then ',' || col3 else '' end),
             (case when col4 <> col3 and col4 <> col2 and col4 <> col1 then ',' || col4 else '' end)
            )
           ) as newcol
from t;

Upvotes: 0

Alessio Demurtas
Alessio Demurtas

Reputation: 14

I've come up with a solution for you! :)

The following query returns the four columns (which I named col_1,2,3and 4) and removes the duplicates by joining the test_table with itself.

Here is the code:

SELECT t1.col_1, t2.col_2, t3.col_3, t4.col_4

FROM (
    SELECT id, col_1
        FROM test_table
) AS t1

LEFT JOIN (
    SELECT id, col_2
        FROM test_table
) as t2

ON (t2.id = t1.id and t2.col_2 <> t1.col_1)


LEFT JOIN (
    SELECT id, col_3
        FROM test_table
) as t3

ON (t3.id = t1.id and t3.col_3 <> t1.col_1 and t3.col_3 <> t2.col_2)



LEFT JOIN (
    SELECT id, col_4
        FROM test_table
) as t4

ON (t4.id = t1.id and t4.col_4 <> t1.col_1 and t4.col_4 <> t2.col_2 and t4.col_4 <> t3.col_3);

If you want to obtain the final string, you just substitute the "SELECT" row with this one:

SELECT trim(both ' ' FROM  (COALESCE(t1.col_1, '') || ' ' ||  COALESCE(t2.col_2, '') || ' ' || COALESCE(t3.col_3, '') || ' ' || COALESCE(t4.col_4, '')))

this should work with your version of postgres, according with the docs:

[for the trim and concatenation functions]

https://www.postgresql.org/docs/8.2/static/functions-string.html

//***************************************************

[for the coalesce function]

https://www.postgresql.org/docs/8.2/static/functions-conditional.html

Please let me know if I've been of help :)

P.S. Your question sounds like a bad database design: I would have those columns moved on a table in which you could do this operation by using a group by or something similar. Moreover I would do the string concatenation on a separate script. But that's my way of doing :)

Upvotes: 0

rd_nielsen
rd_nielsen

Reputation: 2459

Normalize your table structure, select distinct name values from that table, create a function to aggregate strings (see, e.g., How to concatenate strings of a string field in a PostgreSQL 'group by' query?), and apply that function. Except for the aggregate function creation, this could all be done in a single statement or view.

Upvotes: 2

Related Questions