funnelCONN
funnelCONN

Reputation: 149

Fill NULL rows based on some mathematical operations

I have a table A which contains id and report_day and other columns. Also I've a table B which contains also id, report_day and also subscribers. I want to create a VIEW with id, report_day, subscribers columns. So it's a simple join:

select a.id, a.report_day, b.subscribers  from schema.a
left join schema.b on a.id = b.id 
and a.report_day = b.report_day 

view

Now i want to add column subscribers_increment based on subscribers. But for some days I don't have stats for subscribers column and it's set to NULL. subcribers_increment it's just a (subcribers(current_day) - subscribers (prev_day).

I read some articles and add next statement:

case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment

And now I've next result:

increment

NULL is still NULL. For example it has incorrect increment for 2021-04-07. It's increment for 2 days. Can i divide this value from 2021-04-08 by numbers of days (here it's 2) and write same value for 2021-04-07 and 2021-04-08 (or at least for 2021-04-07 where it was null)? And same logic for all days where subscribers is null?

So i need to follow next rules: If I see NULL value in subcribers column I should go for the next (future) NOT NULL day and grab value for this next day. Substract from this (feature) value last not null value (past - order by date, so we looping back). Divide result of substraction by number of days and fill these rows for column subcribers_increment.

Is it possible?

UPDATE:

For my data it shoud look like this:

enter image description here

UPDATE v2

After applying script: enter image description here

UPDATE v3

case (our increment) 25.03-27.03 still is NULL enter image description here

Upvotes: 0

Views: 59

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270431

The basic idea is:

  1. Use lag() to get the previous subscribers and dates before joining. This assumes that the left join is the cause of all the NULL values.
  2. Use a cumulative count in reverse to assign a grouping so NULL is combined with the next value in one grouping.
  3. As a result of (2), the count of NULLs in a group is the denominator
  4. As a result of (1) the difference between subscribers and prev_subscribers is the numerator.
  5. The actual calculation requires more window functions and case logic.

So the idea is:

with t as (
      select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
             count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
      from first_table a left join
           (select b.*,
                   lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
                   lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
            from second_table b
           ) b
           on a.id = b.id and a.report_day = b.report_day 
     )
select t.*,
       (case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
             then t.subscribers - t.prev_subscribers
             when t.subscribers is not null
             then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
             when t.subscribers is null
             then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
                  ) / count(*) over (partition by id, grp)
        end)
from t;

Here is a db<>fiddle.

Upvotes: 2

Related Questions