kain
kain

Reputation: 5570

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):

SELECT
    media_files.album,
    media_files.artist,
    ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
    media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
    playlist_media_files.playlist_id = 1
GROUP BY
    media_files.album,
    media_files.artist
ORDER BY
    media_files.album ASC

and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.

The problem is that I have another column in media files, which is artwork.

artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.

So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).

Is that possible to do with only SQL/Window Functions (first_value over..)?

Upvotes: 4

Views: 10115

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657952

"First" pick

Wouldn't it be simpler / cheaper to just use min():

SELECT m.album
      ,m.artist
      ,array_agg(m.id) AS media_file_ids
      ,min(m.artwork)  AS artwork
FROM   playlist_media_files p
JOIN   media_files          m ON m.id = p.media_file_id
WHERE  p.playlist_id = 1
GROUP  BY m.album, m.artist
ORDER  BY m.album, m.artist;

Abitrary / random pick

If you are looking for a random selection, @Craig already provided a solution with truly random picks.

You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.

For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:

WITH x AS (
   SELECT m.album, m.artist, m.id, m.artwork
   FROM   playlist_media_files p
   JOIN   media_files          m ON m.id = p.media_file_id
   )
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM  (
   SELECT album, artist, array_agg(id) AS media_file_ids
   FROM   x
   ) a
JOIN  (
   SELECT DISTINCT ON (1,2)  album, artist, artwork
   FROM x
   ) b USING (album, artist);

For truly random results, you can add an ORDER BY .. random() like this to subquery b:

JOIN  (
   SELECT DISTINCT ON (1, 2)  album, artist, artwork
   FROM   x
   ORDER  BY 1, 2, random()
   ) b USING (album, artist);

Upvotes: 3

Craig Ringer
Craig Ringer

Reputation: 324741

Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:

SELECT
    mf.album,
    mf.artist,
    ARRAY_AGG (mf.id) AS media_file_ids
FROM
    "media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
    pmf.playlist_id = 1
GROUP BY
    mf.album,
    mf.artist
ORDER BY
    mf.album ASC

Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.

What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.

To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:

(SELECT mf2.artwork 
 FROM media_files mf2 
 WHERE mf2.artist = mf.artist
   AND mf2.album = mf.album
 LIMIT 1) AS picked_artwork

You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.

Alternately, here's a quick and dirty way to implement selection of a random row in-line:

(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)] 

Since there's no sample data I can't test these modifications. Let me know if there's an issue.

Upvotes: 4

Related Questions