Reputation: 63
Let's say I have a SQL Table as such:
serial(pk) | name | status | location | time_updated |
---|---|---|---|---|
1 | 'joe' | 'at_home' | 'USA' | 01:30 |
2 | 'jane' | 'at_store' | 'USA' | 02:30 |
3 | 'joe' | 'driving' | 'USA' | 12:15 |
4 | 'joe' | 'driving' | 'USA' | 13:30 |
5 | 'joe' | 'at_store' | 'USA' | 15:00 |
5 | 'joe' | 'at_store' | 'USA' | 15:15 |
6 | 'joe' | 'driving' | 'USA' | 16:00 |
7 | 'joe' | 'driving' | 'USA' | 17:10 |
8 | 'joe' | 'at_home' | 'USA' | 20:00 |
In this table there can be three different status': "at_home", "at_store" and "driving".
I want a chronological order of movements by each person. For instance, for joe, this would look like at_home -> driving -> driving -> at_store -> at_store -> driving -> driving -> at_home
However, I want to remove any duplicate 'driving' status' and only keep the earliest. For instance, for joe, this would look like at_home -> driving -> at_store -> at_store -> driving -> at_home. I do not want to remove duplicate 'at_home' or 'at_store'
In this example, I want to keep the 'driving' status from 12:15 and keep the 'driving' status at 16:00 while removing the duplicate ones that follow.
I want to do this for each person specifically so when I do an 'order by time_updated', I can see all entries for that person in order.
If I query the resulting table using "select * from db where name = 'joe' order by time_updated"
, my ideal result would be:
serial(pk) | name | status | location | time_updated |
---|---|---|---|---|
1 | 'joe' | 'at_home' | 'USA' | 01:30 |
3 | 'joe' | 'driving' | 'USA' | 12:15 |
5 | 'joe' | 'at_store' | 'USA' | 15:00 |
5 | 'joe' | 'at_store' | 'USA' | 15:15 |
6 | 'joe' | 'driving' | 'USA' | 16:00 |
8 | 'joe' | 'at_home' | 'USA' | 20:00 |
Is there a way this can be done in postgres?
Thank you
Upvotes: 0
Views: 333
Reputation: 2049
You don't need to delete contiguous rows with status
= 'driving' (by name
), you can exclude them from the result with this query (using window function LAG()
):
WITH sq AS (SELECT *,
CASE WHEN status = 'driving' AND status = LAG(status) OVER (PARTITION BY name ORDER BY time_updated) THEN 1 ELSE 0 END contiguous_driving
FROM t)
SELECT *
FROM sq
WHERE name = 'joe' AND contiguous_driving = 0
ORDER BY time_updated;
If you still want to delete them, you can use the same subquery:
WITH sq AS (SELECT *,
CASE WHEN status = 'driving' AND status = LAG(status) OVER (PARTITION BY name ORDER BY time_updated) THEN 1 ELSE 0 END contiguous_driving
FROM t)
DELETE FROM t
WHERE serial IN (SELECT serial
FROM sq
WHERE contiguous_driving = 1);
If serial
column is the primary key, it cannot have duplicated values.
Upvotes: 2
Reputation: 1075
First you get your results and order them by dates in reverse order to get the last row, you might even limit this search to a few of them or just the last one.
SELECT * FROM table ORDER BY time_updated DESC LIMIT 1;
Then if the status is 'driving'
UPDATE table SET time_updated = currenttime WHERE id = currentID
You might be able to do this automatically with triggers and procedures directly on postgres but that's beyond this question
Upvotes: 1
Reputation: 522117
You could use a delete with exists logic:
DELETE
FROM yourTable t1
WHERE status = 'driving' AND
NOT EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.name = t1.name AND
t2.time_updated < t1.time_updated);
Upvotes: 1