user17760969
user17760969

Reputation:

Return the last row from the duplicate rows

How can I get only the unique rows based on comparison between three columns in the table.

id date location value
1 2022-09-06 13:09 point 1 1
1 2022-09-06 13:09 point 2 1
2 2022-09-06 13:09 point 1 4
3 2022-09-06 13:10 point 1 2
3 2022-09-06 13:10 point 1 5
3 2022-09-06 13:10 point 1 6

After checking the column "ID", "DATE", and "Location" we can see the only id = 3 has three rows with same value in three columns so I will keep only the last row from.

Expected table:

id date location value
1 2022-09-06 13:09 point 1 1
1 2022-09-06 13:09 point 2 1
2 2022-09-06 13:09 point 1 4
3 2022-09-06 13:10 point 1 6

I have tried using this query but it doesn't return the expected results

SELECT *
FROM my table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) = 1
order by ID, DATE;

Upvotes: 1

Views: 519

Answers (2)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

So testing you "data" verse your SQL:

with data as (
    select * from values
        (1, '2022-09-06 13:09', 'point 1', 1),
        (1, '2022-09-06 13:09', 'point 2', 1),
        (2, '2022-09-06 13:09', 'point 1', 4),
        (3, '2022-09-06 13:10', 'point 1', 2),
        (3, '2022-09-06 13:10', 'point 1', 5),
        (3, '2022-09-06 13:10', 'point 1', 6)
        t(id, date, location, value)
)
SELECT *
FROM data
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) = 1
order by ID, DATE;

gives:

ID DATE LOCATION VALUE
1 2022-09-06 13:09 point 1 1
1 2022-09-06 13:09 point 2 1
2 2022-09-06 13:09 point 1 4
3 2022-09-06 13:10 point 1 2

which is not that "expected" answer, the problem is with the input and the columns you have used the three id=3 rows are "all the same" from a order by date perspective, and thus there is no stable way to choose the 6 value, without some extra value.. because if we assume value is a random value, it cannot be used for sorting...

We can implicitly state that "the order the rows are inserted is the order" but that requires capturing that order at the time of insertion, because after insertion there is no under the hood like ROW_ID in Oracle.

So if you want stability you need more facts, thus a some form of sequence:


create table new_table(id number, date timestamp, location text, value number, seq number);


-- 6th of Sep batch
insert into new_table 
    select id, date, location, value, seq8() from values
        (1, '2022-09-06 13:09', 'point 1', 1),
        (1, '2022-09-06 13:09', 'point 2', 1),
        (2, '2022-09-06 13:09', 'point 1', 4)
        t(id, date, location, value);
        
-- 6th of Sep 13:10 batch
insert into new_table 
    select id, date, location, value, seq8() from values
        (3, '2022-09-06 13:10', 'point 1', 2),
        (3, '2022-09-06 13:10', 'point 1', 5),
        (3, '2022-09-06 13:10', 'point 1', 6)
        t(id, date, location, value);
select * from new_table;

gives:

ID DATE LOCATION VALUE SEQ
1 2022-09-06 13:09:00.000 point 1 1 0
1 2022-09-06 13:09:00.000 point 2 1 1
2 2022-09-06 13:09:00.000 point 1 4 2
3 2022-09-06 13:10:00.000 point 1 2 0
3 2022-09-06 13:10:00.000 point 1 5 1
3 2022-09-06 13:10:00.000 point 1 6 2

thus now:

SELECT *
FROM new_table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE, LOCATION ORDER BY seq desc) = 1
order by 1,2;

this gives:

ID DATE LOCATION VALUE SEQ
1 2022-09-06 13:09:00.000 point 2 1 1
1 2022-09-06 13:09:00.000 point 1 1 0
2 2022-09-06 13:09:00.000 point 1 4 2
3 2022-09-06 13:10:00.000 point 1 6 2

Upvotes: 0

Lukasz Szozda
Lukasz Szozda

Reputation: 175566

Most likely the data has different time component(second/milisecond):

SELECT * 
FROM my table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TRUNC(DATE, 'MINUTE'), LOCATION 
                           ORDER BY DATE DESC) = 1 
ORDER BY ID, DATE;

ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) sorting by ID has no effect as it is the same per entire partition.

Upvotes: 1

Related Questions