Nathan Wienert
Nathan Wienert

Reputation: 1623

Order by join column but use distinct on another

I'm building a system in which there are the following tables:

A user follows stations, which have songs on them through broadcasts.

I'm building a "feed" of songs for a user based on the stations they follow.

Here's the query:

SELECT DISTINCT ON ("broadcasts"."created_at", "songs"."id") songs.*
FROM "songs"
INNER JOIN "broadcasts" ON "songs"."shared_id" = "broadcasts"."song_id"
INNER JOIN "stations" ON "broadcasts"."station_id" = "stations"."id"
INNER JOIN "follows" ON "stations"."id" = "follows"."station_id"
WHERE "follows"."user_id" = 2
ORDER BY broadcasts.created_at desc
LIMIT 18

sql not working distinct on join

Note: shared_id is the same as id.

As you can see I'm getting duplicate results, which I don't want. I found out from a previous question that this was due to selecting distinct on broadcasts.created_at.

My question is: How do I modify this query so it will return only unique songs based on their id but still order by broadcasts.created_at?

Upvotes: 1

Views: 2441

Answers (4)

Eric Boehs
Eric Boehs

Reputation: 1327

I had a very similar query I was doing between listens, tracks and albums and it took me a long while to figure it out (hours).

If you use a GROUP_BY songs.id, you can get it to work by ordering by MAX(broadcasts.created_at) DESC.

Here's what the full SQL looks like:

SELECT songs.* FROM "songs"
INNER JOIN "broadcasts" ON "songs"."shared_id" = "broadcasts"."song_id"
INNER JOIN "stations" ON "broadcasts"."station_id" = "stations"."id"
INNER JOIN "follows" ON "stations"."id" = "follows"."station_id"
WHERE "follows"."user_id" = 2
GROUP BY songs.id
ORDER BY MAX(broadcasts.created_at) desc
LIMIT 18;

Upvotes: 0

Shehzad Bilal
Shehzad Bilal

Reputation: 2523

Try by adding GROUP BY Songs.id

Upvotes: 0

Luxspes
Luxspes

Reputation: 6760

You can use Common Table Expressions (CTE) if you want a cleaner query (nested queries make things harder to read)

I would look like this:

WITH a as (
        SELECT bb.song_id, MAX(bb.created_at) AS maxcreated
        FROM follows aa
        INNER JOIN broadcasts bb ON aa.station_id = bb.station_id
        INNER JOIN songs cc ON bb.song_id = cc.shared_id
        WHERE aa.user_id = 2
        GROUP BY bb.song_id
    )
SELECT
    a.maxcreated,
    b.*
FROM a INNER JOIN
    songs b ON a.song_id = b.id
ORDER BY
    a.maxcreated DESC
LIMIT 18

Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.

Upvotes: 2

Zane Bien
Zane Bien

Reputation: 23125

Try this solution:

SELECT a.maxcreated, b.*
FROM
    (
        SELECT bb.song_id, MAX(bb.created_at) AS maxcreated
        FROM follows aa
        INNER JOIN broadcasts bb ON aa.station_id = bb.station_id
        WHERE aa.user_id = 2
        GROUP BY bb.song_id
    ) a
INNER JOIN songs b ON a.song_id = b.id
ORDER BY a.maxcreated DESC
LIMIT 18

The FROM subselect retrieves distinct song_ids that are broadcasted by all stations the user follows; it also gets the latest broadcast date associated with each song. We have to encase this in a subquery because we have to GROUP BY on the columns we're selecting from, and we only want the unique song_id and the maxdate regardless of the station.

We then join that result in the outer query to the songs table to get the song information associated with each unique song_id

Upvotes: 2

Related Questions