getunstuck
getunstuck

Reputation: 63

Delete duplicate rows of a table based on the value of the previous row

Let's say I have a SQL Table as such:

serial(pk) name status location time_updated
1 'joe' 'at_home' 'USA' 01:30
2 'jane' 'at_store' 'USA' 02:30
3 'joe' 'driving' 'USA' 12:15
4 'joe' 'driving' 'USA' 13:30
5 'joe' 'at_store' 'USA' 15:00
5 'joe' 'at_store' 'USA' 15:15
6 'joe' 'driving' 'USA' 16:00
7 'joe' 'driving' 'USA' 17:10
8 'joe' 'at_home' 'USA' 20:00

In this table there can be three different status': "at_home", "at_store" and "driving".

I want a chronological order of movements by each person. For instance, for joe, this would look like at_home -> driving -> driving -> at_store -> at_store -> driving -> driving -> at_home

However, I want to remove any duplicate 'driving' status' and only keep the earliest. For instance, for joe, this would look like at_home -> driving -> at_store -> at_store -> driving -> at_home. I do not want to remove duplicate 'at_home' or 'at_store'

In this example, I want to keep the 'driving' status from 12:15 and keep the 'driving' status at 16:00 while removing the duplicate ones that follow.

I want to do this for each person specifically so when I do an 'order by time_updated', I can see all entries for that person in order.

If I query the resulting table using "select * from db where name = 'joe' order by time_updated", my ideal result would be:

serial(pk) name status location time_updated
1 'joe' 'at_home' 'USA' 01:30
3 'joe' 'driving' 'USA' 12:15
5 'joe' 'at_store' 'USA' 15:00
5 'joe' 'at_store' 'USA' 15:15
6 'joe' 'driving' 'USA' 16:00
8 'joe' 'at_home' 'USA' 20:00

Is there a way this can be done in postgres?

Thank you

Upvotes: 0

Views: 333

Answers (3)

nachospiu
nachospiu

Reputation: 2049

You don't need to delete contiguous rows with status = 'driving' (by name), you can exclude them from the result with this query (using window function LAG()):

WITH sq AS (SELECT *,
                   CASE WHEN status = 'driving' AND status = LAG(status) OVER (PARTITION BY name ORDER BY time_updated) THEN 1 ELSE 0 END contiguous_driving
            FROM t)
            
SELECT * 
FROM sq
WHERE name = 'joe' AND contiguous_driving = 0
ORDER BY time_updated;

If you still want to delete them, you can use the same subquery:

WITH sq AS (SELECT *,
                   CASE WHEN status = 'driving' AND status = LAG(status) OVER (PARTITION BY name ORDER BY time_updated) THEN 1 ELSE 0 END contiguous_driving
            FROM t)
            
DELETE FROM t
WHERE serial IN (SELECT serial
                 FROM sq 
                 WHERE contiguous_driving = 1);

If serial column is the primary key, it cannot have duplicated values.

Upvotes: 2

Manuel Duarte
Manuel Duarte

Reputation: 1075

First you get your results and order them by dates in reverse order to get the last row, you might even limit this search to a few of them or just the last one.

SELECT * FROM table ORDER BY time_updated DESC LIMIT 1;

Then if the status is 'driving'

UPDATE table SET time_updated = currenttime WHERE id = currentID

You might be able to do this automatically with triggers and procedures directly on postgres but that's beyond this question

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522117

You could use a delete with exists logic:

DELETE
FROM yourTable t1
WHERE status = 'driving' AND
      NOT EXISTS (SELECT 1 FROM yourTable t2
                  WHERE t2.name = t1.name AND
                        t2.time_updated < t1.time_updated);

Upvotes: 1

Related Questions