SELECT from subquery without having to specify all columns in GROUP BY

Question

Idea is to query an article table where an article has a given tag, and then to STRING_AGG all (even unrelated) tags that belong to that article row.

Example tables and query:

CREATE TABLE article (id SERIAL, body TEXT);
CREATE TABLE article_tag (article INT, tag INT);
CREATE TABLE tag (id SERIAL, title TEXT);

SELECT DISTICT ON (id)
    q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
    SELECT a.*, tag.title tag_title
    FROM article a
        LEFT JOIN article_tag x ON a.id = tag.article
        LEFT JOIN tag ON tag.id = x.tag
    WHERE tag.title = 'someTag'
) q
GROUP BY q.id

Running the above, postgres require that the q.body must be included in GROUP BY:

ERROR:  column "q.body" must appear in the GROUP BY clause or be used in an aggregate function

As I understand it, it's because subquery q doesn't include any PRIMARY key.

I naively thought that the DISTINCT ON would supplement that, but it doesn't seem so.

Is there a way to mark a column in a subquery as PRIMARY so that we don't have to list all columns in GROUP BY clause?

If we do have to list all columns in GROUP BY clause, does that incur significant perf cost?

EDIT: to elaborate, since PostgreSQL 9.1 you don't have to supply non-primary (i.e. functionally dependent) keys when using GROUP BY, e.g. following query works fine:

SELECT a.id, a.body, STRING_AGG(tag.title, '|') tags
FROM article a
    LEFT JOIN article_tag x ON a.id = tag.article
    LEFT JOIN tag ON tag.id = x.tag
GROUP BY a.id

I was wondering if I can leverage the same behavior, but with a subquery (by somehow indicating that q.id is a PRIMARY key).

Łukasz Kamiński · Accepted Answer

It sadly doesn't work when you wrap your primary key in subquery and I don't know of any way to "mark it" as you suggested.

You can try this workaround using window function and distinct:

CREATE TABLE test1 (id serial primary key, name text, value text);
CREATE TABLE test2 (id serial primary key, test1_id int, value text);

INSERT INTO test1(name, value)
           values('name1', 'test01'), ('name2', 'test02'), ('name3', 'test03');

INSERT INTO test2(test1_id, value)
           values(1, 'test1'), (1, 'test2'), (3, 'test3');

SELECT DISTINCT ON (id) id, name, string_agg(value2, '|') over (partition by id)
  FROM (SELECT test1.*, test2.value AS value2
          FROM test1
          LEFT JOIN test2 ON test2.test1_id = test1.id) AS sub;

id  name    string_agg
1   name1   test1|test2
2   name2   null
3   name3   test3

Demo

SELECT from subquery without having to specify all columns in GROUP BY

Answers (2)

Related Questions