Daniel
Daniel

Reputation: 3527

SQL - Ignoring / Removing Duplicates based on Two Columns

Suppose I have a single table actions like so:

+----+------------+--------+-------+
| id |    rdate   | clicks | epoch |
+----+------------+--------+-------+
|  1 | 2020-01-01 |    100 |  1200 |
|  1 | 2020-01-01 |     95 |  1100 |
|  1 | 2020-10-12 |     42 |  1000 |
|  1 | 2020-10-12 |     66 |   900 |
+----+------------+--------+-------+

I am trying to write a query that gives me the number of clicks based on the MAX(epoch) for records with identical values in the id and rdate columns - the end result should be: (note, I do not need the epoch column in the result)

+----+------------+--------+
| id |    date    | clicks |
+----+------------+--------+
|  1 | 2020-01-01 |    100 |
|  1 | 2020-10-12 |     42 |
+----+------------+--------+

I have tried the following query but the duplicates are still present in the result. The group by query does remove duplicates when run by itself, but the inner join to get the clicks does not work as intended.

SELECT
    id,
    rdate,
    clicks
FROM actions a
INNER JOIN (
    SELECT 
        id,
        rdate, 
        MAX(epoch)
    from actions  
    group by 
        id,
        rdate
) b 
on a.id = b.id
and a.rdate = b.rdate;

Upvotes: 1

Views: 65

Answers (2)

Lukasz Szozda
Lukasz Szozda

Reputation: 175556

You could use QUALIFY and windowed function:

SELECT *
FROM actions
QUALIFY ROW_NUMBER() OVER(PARTITION BY id, rdate ORDER BY epoch DESC) = 1

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520878

I suggest using ROW_NUMBER here:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id, rdate ORDER BY epoch DESC) rn
    FROM actions
)

SELECT id, rdate, clicks, epoch
FROM cte
WHERE rn = 1;

If you want to stick with your current join approach, then you need to fix the logic such that the join to the subquery also restricts the number of clicks:

SELECT a1.*
FROM actions a1
INNER JOIN
(
    SELECT id, rdate, MAX(epoch) AS max_epoch
    FROM actions
    GROUP BY id, rdate
) a2
    ON a2.id = a1.id AND a2.rdate = a1.rdate AND a2.max_epoch = a1.epoch;

Upvotes: 1

Related Questions