Reputation:
How can I get only the unique rows based on comparison between three columns in the table.
id | date | location | value |
---|---|---|---|
1 | 2022-09-06 13:09 | point 1 | 1 |
1 | 2022-09-06 13:09 | point 2 | 1 |
2 | 2022-09-06 13:09 | point 1 | 4 |
3 | 2022-09-06 13:10 | point 1 | 2 |
3 | 2022-09-06 13:10 | point 1 | 5 |
3 | 2022-09-06 13:10 | point 1 | 6 |
After checking the column "ID", "DATE", and "Location" we can see the only id = 3 has three rows with same value in three columns so I will keep only the last row from.
Expected table:
id | date | location | value |
---|---|---|---|
1 | 2022-09-06 13:09 | point 1 | 1 |
1 | 2022-09-06 13:09 | point 2 | 1 |
2 | 2022-09-06 13:09 | point 1 | 4 |
3 | 2022-09-06 13:10 | point 1 | 6 |
I have tried using this query but it doesn't return the expected results
SELECT *
FROM my table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) = 1
order by ID, DATE;
Upvotes: 1
Views: 519
Reputation: 25903
So testing you "data" verse your SQL:
with data as (
select * from values
(1, '2022-09-06 13:09', 'point 1', 1),
(1, '2022-09-06 13:09', 'point 2', 1),
(2, '2022-09-06 13:09', 'point 1', 4),
(3, '2022-09-06 13:10', 'point 1', 2),
(3, '2022-09-06 13:10', 'point 1', 5),
(3, '2022-09-06 13:10', 'point 1', 6)
t(id, date, location, value)
)
SELECT *
FROM data
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE) = 1
order by ID, DATE;
gives:
ID | DATE | LOCATION | VALUE |
---|---|---|---|
1 | 2022-09-06 13:09 | point 1 | 1 |
1 | 2022-09-06 13:09 | point 2 | 1 |
2 | 2022-09-06 13:09 | point 1 | 4 |
3 | 2022-09-06 13:10 | point 1 | 2 |
which is not that "expected" answer, the problem is with the input and the columns you have used the three id=3
rows are "all the same" from a order by date perspective, and thus there is no stable way to choose the 6 value, without some extra value.. because if we assume value is a random value, it cannot be used for sorting...
We can implicitly state that "the order the rows are inserted is the order" but that requires capturing that order at the time of insertion, because after insertion there is no under the hood like ROW_ID in Oracle.
So if you want stability you need more facts, thus a some form of sequence:
create table new_table(id number, date timestamp, location text, value number, seq number);
-- 6th of Sep batch
insert into new_table
select id, date, location, value, seq8() from values
(1, '2022-09-06 13:09', 'point 1', 1),
(1, '2022-09-06 13:09', 'point 2', 1),
(2, '2022-09-06 13:09', 'point 1', 4)
t(id, date, location, value);
-- 6th of Sep 13:10 batch
insert into new_table
select id, date, location, value, seq8() from values
(3, '2022-09-06 13:10', 'point 1', 2),
(3, '2022-09-06 13:10', 'point 1', 5),
(3, '2022-09-06 13:10', 'point 1', 6)
t(id, date, location, value);
select * from new_table;
gives:
ID | DATE | LOCATION | VALUE | SEQ |
---|---|---|---|---|
1 | 2022-09-06 13:09:00.000 | point 1 | 1 | 0 |
1 | 2022-09-06 13:09:00.000 | point 2 | 1 | 1 |
2 | 2022-09-06 13:09:00.000 | point 1 | 4 | 2 |
3 | 2022-09-06 13:10:00.000 | point 1 | 2 | 0 |
3 | 2022-09-06 13:10:00.000 | point 1 | 5 | 1 |
3 | 2022-09-06 13:10:00.000 | point 1 | 6 | 2 |
thus now:
SELECT *
FROM new_table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, DATE, LOCATION ORDER BY seq desc) = 1
order by 1,2;
this gives:
ID | DATE | LOCATION | VALUE | SEQ |
---|---|---|---|---|
1 | 2022-09-06 13:09:00.000 | point 2 | 1 | 1 |
1 | 2022-09-06 13:09:00.000 | point 1 | 1 | 0 |
2 | 2022-09-06 13:09:00.000 | point 1 | 4 | 2 |
3 | 2022-09-06 13:10:00.000 | point 1 | 6 | 2 |
Upvotes: 0
Reputation: 175566
Most likely the data has different time component(second/milisecond):
SELECT *
FROM my table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TRUNC(DATE, 'MINUTE'), LOCATION
ORDER BY DATE DESC) = 1
ORDER BY ID, DATE;
ROW_NUMBER() OVER (PARTITION BY ID, DATE,LOCATION ORDER BY ID, DATE)
sorting by ID
has no effect as it is the same per entire partition.
Upvotes: 1