Reputation: 63

Delete duplicate rows of a table based on the value of the previous row

Let's say I have a SQL Table as such:

serial(pk)	name	status	location	time_updated
1	'joe'	'at_home'	'USA'	01:30
2	'jane'	'at_store'	'USA'	02:30
3	'joe'	'driving'	'USA'	12:15
4	'joe'	'driving'	'USA'	13:30
5	'joe'	'at_store'	'USA'	15:00
5	'joe'	'at_store'	'USA'	15:15
6	'joe'	'driving'	'USA'	16:00
7	'joe'	'driving'	'USA'	17:10
8	'joe'	'at_home'	'USA'	20:00

In this table there can be three different status': "at_home", "at_store" and "driving".

I want a chronological order of movements by each person. For instance, for joe, this would look like at_home -> driving -> driving -> at_store -> at_store -> driving -> driving -> at_home

However, I want to remove any duplicate 'driving' status' and only keep the earliest. For instance, for joe, this would look like at_home -> driving -> at_store -> at_store -> driving -> at_home. I do not want to remove duplicate 'at_home' or 'at_store'

In this example, I want to keep the 'driving' status from 12:15 and keep the 'driving' status at 16:00 while removing the duplicate ones that follow.

I want to do this for each person specifically so when I do an 'order by time_updated', I can see all entries for that person in order.

If I query the resulting table using "select * from db where name = 'joe' order by time_updated", my ideal result would be:

serial(pk)	name	status	location	time_updated
1	'joe'	'at_home'	'USA'	01:30
3	'joe'	'driving'	'USA'	12:15
5	'joe'	'at_store'	'USA'	15:00
5	'joe'	'at_store'	'USA'	15:15
6	'joe'	'driving'	'USA'	16:00
8	'joe'	'at_home'	'USA'	20:00

Is there a way this can be done in postgres?

Thank you

Upvotes: 0

Answers (3)

nachospiu

Reputation: 2049

You don't need to delete contiguous rows with status = 'driving' (by name), you can exclude them from the result with this query (using window function LAG()):

WITH sq AS (SELECT *,
                   CASE WHEN status = 'driving' AND status = LAG(status) OVER (PARTITION BY name ORDER BY time_updated) THEN 1 ELSE 0 END contiguous_driving
            FROM t)
            
SELECT * 
FROM sq
WHERE name = 'joe' AND contiguous_driving = 0
ORDER BY time_updated;

If you still want to delete them, you can use the same subquery:

WITH sq AS (SELECT *,
                   CASE WHEN status = 'driving' AND status = LAG(status) OVER (PARTITION BY name ORDER BY time_updated) THEN 1 ELSE 0 END contiguous_driving
            FROM t)
            
DELETE FROM t
WHERE serial IN (SELECT serial
                 FROM sq 
                 WHERE contiguous_driving = 1);

If serial column is the primary key, it cannot have duplicated values.

Upvotes: 2

Manuel Duarte

Reputation: 1075

First you get your results and order them by dates in reverse order to get the last row, you might even limit this search to a few of them or just the last one.

SELECT * FROM table ORDER BY time_updated DESC LIMIT 1;

Then if the status is 'driving'

UPDATE table SET time_updated = currenttime WHERE id = currentID

You might be able to do this automatically with triggers and procedures directly on postgres but that's beyond this question

Upvotes: 1

Tim Biegeleisen

Reputation: 522117

You could use a delete with exists logic:

DELETE
FROM yourTable t1
WHERE status = 'driving' AND
      NOT EXISTS (SELECT 1 FROM yourTable t2
                  WHERE t2.name = t1.name AND
                        t2.time_updated < t1.time_updated);

Upvotes: 1

Delete duplicate rows of a table based on the value of the previous row

Answers (3)

Related Questions