Reputation: 655
So I have a postgresql table called state_data
where there are two columns: datetime
and state
. The state
column is of jsonb type and specifies various state data for a given datetime. Here is an example of the table:
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
Over time this table will get very big - particularly I increase sampling frequency - and I really only want to store data where consecutive rows have different temperatures. So the table above would reduce to,
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
I know how to do this if the temperature data were in its own column, but is there a straightforward way to handle this operation and delete all consecutive duplicates based on an item within a json column?
What if I wanted to delete duplicates for both json items? For example,
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
Upvotes: 1
Views: 339
Reputation: 121624
Use the window function lag():
select datetime, state
from (
select datetime, state, lag(state) over (order by datetime) as prev
from state_data
) s
where state->>'temp' is distinct from prev->>'temp'
If the table has a primary key you should use it in the delete command. In the lack of a primary key you can cast state
to jsonb:
delete from state_data
where (datetime, state::jsonb) not in (
select datetime, state::jsonb
from (
select datetime, state, lag(state) over (order by datetime) as prev
from state_data
) s
where state->>'temp' is distinct from prev->>'temp'
)
Upvotes: 2