Reputation: 149
I have a table A which contains id
and report_day
and other columns. Also I've a table B which contains also id
, report_day
and also subscribers
. I want to create a VIEW with id, report_day, subscribers
columns. So it's a simple join:
select a.id, a.report_day, b.subscribers from schema.a
left join schema.b on a.id = b.id
and a.report_day = b.report_day
Now i want to add column subscribers_increment
based on subscribers
. But for some days I don't have stats for subscribers
column and it's set to NULL
. subcribers_increment
it's just a (subcribers(current_day) - subscribers (prev_day).
I read some articles and add next statement:
case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment
And now I've next result:
NULL
is still NULL
.
For example it has incorrect increment
for 2021-04-07
. It's increment for 2 days. Can i divide this value from 2021-04-08
by numbers of days (here it's 2) and write same value for 2021-04-07
and 2021-04-08
(or at least for 2021-04-07
where it was null)? And same logic for all days where subscribers
is null?
So i need to follow next rules:
If I see NULL
value in subcribers
column I should go for the next (future) NOT NULL day and grab value for this next day. Substract from this (feature) value last not null value (past - order by date, so we looping back). Divide result of substraction by number of days and fill these rows for column subcribers_increment
.
Is it possible?
UPDATE:
For my data it shoud look like this:
UPDATE v2
UPDATE v3
case (our increment) 25.03-27.03 still is NULL
Upvotes: 0
Views: 59
Reputation: 1270431
The basic idea is:
lag()
to get the previous subscribers
and dates before joining. This assumes that the left join
is the cause of all the NULL
values.NULL
is combined with the next value in one grouping.NULL
s in a group is the denominatorsubscribers
and prev_subscribers
is the numerator.So the idea is:
with t as (
select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
from first_table a left join
(select b.*,
lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
from second_table b
) b
on a.id = b.id and a.report_day = b.report_day
)
select t.*,
(case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
then t.subscribers - t.prev_subscribers
when t.subscribers is not null
then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
when t.subscribers is null
then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
) / count(*) over (partition by id, grp)
end)
from t;
Here is a db<>fiddle.
Upvotes: 2