Return the last row from the duplicate rows

Question

How can I get only the unique rows based on comparison between three columns in the table.

id	date	location	value
1	2022-09-06 13:09	point 1	1
1	2022-09-06 13:09	point 2	1
2	2022-09-06 13:09	point 1	4
3	2022-09-06 13:10	point 1	2
3	2022-09-06 13:10	point 1	5
3	2022-09-06 13:10	point 1	6

After checking the column "ID", "DATE", and "Location" we can see the only id = 3 has three rows with same value in three columns so I will keep only the last row from.

Expected table:

id	date	location	value
1	2022-09-06 13:09	point 1	1
1	2022-09-06 13:09	point 2	1
2	2022-09-06 13:09	point 1	4
3	2022-09-06 13:10	point 1	6

I have tried using this query but it doesn't return the expected results

SELECT *
FROM my table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) = 1
order by ID, DATE;

Simeon Pilgrim · Accepted Answer

So testing you "data" verse your SQL:

with data as (
    select * from values
        (1, '2022-09-06 13:09', 'point 1', 1),
        (1, '2022-09-06 13:09', 'point 2', 1),
        (2, '2022-09-06 13:09', 'point 1', 4),
        (3, '2022-09-06 13:10', 'point 1', 2),
        (3, '2022-09-06 13:10', 'point 1', 5),
        (3, '2022-09-06 13:10', 'point 1', 6)
        t(id, date, location, value)
)
SELECT *
FROM data
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) = 1
order by ID, DATE;

gives:

ID	DATE	LOCATION	VALUE
1	2022-09-06 13:09	point 1	1
1	2022-09-06 13:09	point 2	1
2	2022-09-06 13:09	point 1	4
3	2022-09-06 13:10	point 1	2

which is not that "expected" answer, the problem is with the input and the columns you have used the three id=3 rows are "all the same" from a order by date perspective, and thus there is no stable way to choose the 6 value, without some extra value.. because if we assume value is a random value, it cannot be used for sorting...

We can implicitly state that "the order the rows are inserted is the order" but that requires capturing that order at the time of insertion, because after insertion there is no under the hood like ROW_ID in Oracle.

So if you want stability you need more facts, thus a some form of sequence:


create table new_table(id number, date timestamp, location text, value number, seq number);


-- 6th of Sep batch
insert into new_table 
    select id, date, location, value, seq8() from values
        (1, '2022-09-06 13:09', 'point 1', 1),
        (1, '2022-09-06 13:09', 'point 2', 1),
        (2, '2022-09-06 13:09', 'point 1', 4)
        t(id, date, location, value);
        
-- 6th of Sep 13:10 batch
insert into new_table 
    select id, date, location, value, seq8() from values
        (3, '2022-09-06 13:10', 'point 1', 2),
        (3, '2022-09-06 13:10', 'point 1', 5),
        (3, '2022-09-06 13:10', 'point 1', 6)
        t(id, date, location, value);

select * from new_table;

gives:

ID	DATE	LOCATION	VALUE	SEQ
1	2022-09-06 13:09:00.000	point 1	1	0
1	2022-09-06 13:09:00.000	point 2	1	1
2	2022-09-06 13:09:00.000	point 1	4	2
3	2022-09-06 13:10:00.000	point 1	2	0
3	2022-09-06 13:10:00.000	point 1	5	1
3	2022-09-06 13:10:00.000	point 1	6	2

thus now:

SELECT *
FROM new_table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE, LOCATION ORDER BY seq desc) = 1
order by 1,2;

this gives:

ID	DATE	LOCATION	VALUE	SEQ
1	2022-09-06 13:09:00.000	point 2	1	1
1	2022-09-06 13:09:00.000	point 1	1	0
2	2022-09-06 13:09:00.000	point 1	4	2
3	2022-09-06 13:10:00.000	point 1	6	2

Return the last row from the duplicate rows

Answers (2)

Related Questions