Reputation: 3
I am trying to group rows on a postgres query. I have one row of data that has a field with a line number in it. The next 3 rows are null. I want to set them to the same value as the first row. Then the 5th row repeats the process. There may be 1 or more rows with null as the line number.
Here is a sample of the data with a simple query to pull the rows I want, ordered by line_num:
item_id | ordered_qty | unit | line_num | group_line_num | code |
---|---|---|---|---|---|
NP4484140T | 11 | PST^ | 7 | 7 | |
COL48LPT | 6 | PST | 8 | null | NP4484140T |
COL48EPT | 4 | PST | 9 | null | NP4484140T |
COL48BPT | 1 | PST | 10 | null | NP4484140T |
VP5578135T | 1 | PST^ | 52 | 52 | |
ONTP48CPT | 1 | PST | 53 | null | VP5578135T |
I want group_line_num to be 7 where it is null in the first set and 52 on the 4th null. The groups can also be created based on where the unit changes with the PST^ being the defining line of the group. I have tried using lag but since the number of null lines is variable I am not sure how to make it work.
Upvotes: 0
Views: 59
Reputation: 3180
The described results can be easily achieved using MAX
as a window function.
The following code establishes a table of example values:
CREATE TABLE examples (
item_id TEXT NOT NULL,
ordered_qty INTEGER NOT NULL,
unit TEXT NOT NULL,
line_num INTEGER NOT NULL,
group_line_num INTEGER,
code TEXT
);
INSERT INTO
examples (
item_id,
ordered_qty,
unit,
line_num,
group_line_num,
code
)
VALUES
('NP4484140T', 11, 'PST^', 7, 7, ''),
('COL48LPT', 6, 'PST', 8, NULL, 'NP4484140T'),
('COL48EPT', 4, 'PST', 9, NULL, 'NP4484140T'),
('COL48BPT', 1, 'PST', 10, NULL, 'NP4484140T'),
('VP5578135T', 1, 'PST^', 52, 52, ''),
('ONTP48CPT', 1, 'PST', 53, NULL, 'VP5578135T');
To determine the group_line_num
using it's most recent non-NULL value, run the following:
SELECT
item_id,
ordered_qty,
unit,
line_num,
MAX(group_line_num) OVER (ORDER BY line_num) AS group_line_num,
code
FROM
examples
ORDER BY
line_num;
The following query derives group_line_num
from the most recent line_num
associated with unit = 'PST^'
:
SELECT
item_id,
ordered_qty,
unit,
line_num,
MAX(line_num) FILTER (WHERE unit = 'PST^') OVER (ORDER BY line_num) AS group_line_num,
code
FROM
examples
ORDER BY
line_num;
The two queries return identical results:
item_id | ordered_qty | unit | line_num | group_line_num | code |
---|---|---|---|---|---|
NP4484140T | 11 | PST^ | 7 | 7 | null |
COL48LPT | 6 | PST | 8 | 7 | NP4484140T |
COL48EPT | 4 | PST | 9 | 7 | NP4484140T |
COL48BPT | 1 | PST | 10 | 7 | NP4484140T |
VP5578135T | 1 | PST^ | 52 | 52 | |
ONTP48CPT | 1 | PST | 53 | 52 | VP5578135T |
Upvotes: 0
Reputation: 952
Although it is possible to solve this with window functions it is not entirely straightforward and I think that the simplest solution is to self join the previous rows and identify the row with the highest value of group_line_num which can be used to replace a null value in the current row.
create table example (
item_id varchar not null,
ordered_qty integer not null,
unit varchar not null,
line_num integer not null,
group_line_num integer,
code varchar not null
);
insert into example (item_id, ordered_qty, unit, line_num, group_line_num, code)
values ('NP4484140T', 11, 'PST^', 7, 7, ''),
('COL48LPT', 6, 'PST', 8, null, 'NP4484140T'),
('COL48EPT', 4, 'PST', 9, null, 'NP4484140T'),
('COL48BPT', 1, 'PST', 10, null, 'NP4484140T'),
('VP5578135T', 1, 'PST^', 52, 52, ''),
('ONTP48CPT', 1, 'PST', 53, null, 'VP5578135T');
select item_id,
ordered_qty,
unit,
line_num,
coalesce(example.group_line_num, previous.group_line_num) as group_line_num,
code
from example
left join lateral (select previous.group_line_num
from example as previous
where previous.line_num < example.line_num
order by previous.group_line_num desc nulls last
limit 1) as previous on true
order by line_num;
Upvotes: 0