Reputation: 853
Consider the following table that stores update history of some attributes of certain objects, organized by effective
and published
dates:
create table update_history(
obj_id integer,
effective date,
published date,
attr1 text,
attr2 integer,
attr3 boolean,
primary key(obj_id, effective, published)
);
insert into update_history values
(1, '2021-01-01', '2021-01-01', 'foo', null, null),
(1, '2021-01-01', '2021-01-02', null, 1, false),
(1, '2021-01-02', '2021-01-01', 'foo', 1, false),
(1, '2021-01-02', '2021-01-02', 'bar', 1, false),
(1, '2021-01-03', '2021-01-01', 'bar', 1, true),
(1, '2021-01-04', '2021-01-01', 'bar', 1, true),
(1, '2021-01-05', '2021-01-01', 'bar', 2, true),
(1, '2021-01-05', '2021-01-02', 'bar', 1, true),
(1, '2021-01-05', '2021-01-03', 'bar', 1, true),
(1, '2021-01-06', '2021-01-04', 'bar', 1, true)
;
I need to write a PostgreSQL query that will simplify the history view for a given obj_id
by excluding those update records that did not change any attributes from the immediately preceding update as ordered by effective
and published
columns. In essence those would be rows ## 6, 9 and 10, marked in italic in the table below:
# | obj_id | effective | published | attr1 | attr2 | attr3 |
---|---|---|---|---|---|---|
1 | 1 | 2021-01-01 | 2021-01-01 | foo | (null) | (null) |
2 | 1 | 2021-01-01 | 2021-01-02 | (null) | 1 | false |
3 | 1 | 2021-01-02 | 2021-01-01 | foo | 1 | false |
4 | 1 | 2021-01-02 | 2021-01-02 | bar | 1 | false |
5 | 1 | 2021-01-03 | 2021-01-01 | bar | 1 | true |
6 | 1 | 2021-01-04 | 2021-01-01 | bar | 1 | true |
7 | 1 | 2021-01-05 | 2021-01-01 | bar | 2 | true |
8 | 1 | 2021-01-05 | 2021-01-02 | bar | 1 | true |
9 | 1 | 2021-01-05 | 2021-01-03 | bar | 1 | true |
10 | 1 | 2021-01-06 | 2021-01-04 | bar | 1 | true |
Keep in mind that in the real life case there are way more attributes to deal with and I don't want the query to get too messy.
The closest I got to the desired result was using the rank
window function:
select
obj_id, effective, published,
attr1, attr2, attr3
from (
select *,
rank() over (
partition by attr1, attr2, attr3
order by effective, published
) as rank
from update_history
where obj_id = 1) as d
where rank = 1
order by effective, published;
That results in this:
obj_id | effective | published | attr1 | attr2 | attr3 |
---|---|---|---|---|---|
1 | 2021-01-01 | 2021-01-01 | foo | (null) | (null) |
1 | 2021-01-01 | 2021-01-02 | (null) | 1 | false |
1 | 2021-01-02 | 2021-01-01 | foo | 1 | false |
1 | 2021-01-02 | 2021-01-02 | bar | 1 | false |
1 | 2021-01-03 | 2021-01-01 | bar | 1 | true |
1 | 2021-01-05 | 2021-01-01 | bar | 2 | true |
As you can see, row #8 from the original table is erroneously excluded, although it changed attr2
from the its previous row, #7. Apparently, the problem is that partitioning is applied before sorting in the window definition.
I wonder if there is another way to accomplish this with a single PostgresSQL query.
Upvotes: 1
Views: 62
Reputation:
I would use the lag()
for this:
select *
from (
select obj_id, effective, published,
attr1, attr2, attr3,
(attr1, attr2, attr3) is distinct from lag( (attr1,attr2,attr3) ) over (partition by obj_id order by effective, published) as is_different
from update_history
) t
where is_different
Upvotes: 1