Removing consecutive duplicates in a postgresql database where data is in json column

Question

So I have a postgresql table called state_data where there are two columns: datetime and state. The state column is of jsonb type and specifies various state data for a given datetime. Here is an example of the table:

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

Over time this table will get very big - particularly I increase sampling frequency - and I really only want to store data where consecutive rows have different temperatures. So the table above would reduce to,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

I know how to do this if the temperature data were in its own column, but is there a straightforward way to handle this operation and delete all consecutive duplicates based on an item within a json column?

What if I wanted to delete duplicates for both json items? For example,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

klin · Accepted Answer

Use the window function lag():

select datetime, state
from (
    select datetime, state, lag(state) over (order by datetime) as prev
    from state_data
    ) s
where state->>'temp' is distinct from prev->>'temp'

If the table has a primary key you should use it in the delete command. In the lack of a primary key you can cast state to jsonb:

delete from state_data
where (datetime, state::jsonb) not in (
    select datetime, state::jsonb
    from (
        select datetime, state, lag(state) over (order by datetime) as prev
        from state_data
        ) s
    where state->>'temp' is distinct from prev->>'temp'
)

Removing consecutive duplicates in a postgresql database where data is in json column

Answers (1)

Related Questions