Yana K.
Yana K.

Reputation: 2270

Concatenate/merge array values during grouping/aggregation

I have a table with the an array column type:

 title       tags
"ridealong";"{comedy,other}"
"ridealong";"{comedy,tragedy}"
"freddyjason";"{horror,silliness}"

I would like to write a query that produces a single array per title(in an ideal world it would be a set/deduplicated array)

e.g.

select array_cat(tags),title from my_test group by title

The above query doesn't work of course, but I would like to produce 2 rows:

"ridealong";"{comedy,other,tragedy}"
"freddyjason";"{horror,silliness}"

Any help or pointers would be very much appreciated (I am using Postgres 9.1)


Based on Craig's help I ended up with the following (slightly altered syntax since 9.1 complains about the query exactly as he shows it)

SELECT t1.title, array_agg(DISTINCT tag.tag) 
FROM my_test t1, (select unnest(tags) as tag,title from my_test) as tag 
where tag.title=t1.title
GROUP BY t1.title;

Upvotes: 42

Views: 40122

Answers (3)

Konstantinos Gallis
Konstantinos Gallis

Reputation: 1140

Notice

The above solutions have the disadvantage that they do not remove the duplicate values.

So, we end up having duplicate values.

This solution keeps only unique values.

Following the approach of custom aggregate:

1. Create the function to concatenate arrays and remove duplicates

CREATE OR REPLACE FUNCTION array_uniq_cat_agg(anyarray, anyarray)
RETURNS anyarray LANGUAGE SQL AS $$
  SELECT ARRAY(
    SELECT DISTINCT unnest($1 || $2)
  )
$$;

2. Create the aggregate using the custom function

CREATE AGGREGATE array_uniq_cat_agg(anyarray) (
  SFUNC = array_uniq_cat_agg,
  STYPE = anyarray,
  INITCOND = '{}'
);

3. Use the custom aggregate function in your query

SELECT 
  title, 
  array_uniq_cat_agg(tags) AS unique_tags
FROM 
  my_test 
GROUP BY 
  title;

SQL Fiddle

Upvotes: 0

pozs
pozs

Reputation: 36214

The obvious solution would be the LATERAL join (which also suggested by @CraigRinger), but that is added to PostgreSQL in 9.3.

In 9.1 you cannot avoid the sub-query, but you can simplify it:

SELECT title, array_agg(DISTINCT tag)
FROM (SELECT title, unnest(tags) FROM my_test) AS t(title, tag)
GROUP BY title;

SQL Fiddle

Upvotes: 6

Craig Ringer
Craig Ringer

Reputation: 324265

Custom aggregate

Approach 1: define a custom aggregate. Here's one I wrote earlier.

CREATE TABLE my_test(title text, tags text[]);

INSERT INTO my_test(title, tags) VALUES
('ridealong', '{comedy,other}'),
('ridealong', '{comedy,tragedy}'),
('freddyjason', '{horror,silliness}');

CREATE AGGREGATE array_cat_agg(anyarray) (
  SFUNC=array_cat,
  STYPE=anyarray
);

select title, array_cat_agg(tags) from my_test group by title;

LATERAL query

... or since you don't want to preserve order and want to deduplicate, you could use a LATERAL query like:

SELECT title, array_agg(DISTINCT tag ORDER BY tag) 
FROM my_test, unnest(tags) tag 
GROUP BY title;

in which case you don't need the custom aggregate. This one is probably a fair bit slower for big data sets due to the deduplication. Removing the ORDER BY if not required may help, though.

Upvotes: 41

Related Questions