dna dad
dna dad

Reputation: 11

filling the null data by time series

I have a table like this

ts                    item          infoA         infoB         
2018-02-03 12:00:00   A             null          null          
2018-02-03 12:01:00   null          A1            null          
2018-02-03 12:02:00   A             null          null          
2018-02-03 12:03:00   null          null          null          
2018-02-03 12:04:00   null          A2            null           
2018-02-03 12:05:00   null          null          null         
2018-02-03 12:06:00   B             null          null         
2018-02-03 12:07:00   null          null          B1         
2018-02-03 12:08:00   null          null          null         

I want to fill the null data with time series only on related item

ts                    item          infoA         infoB         
2018-02-03 12:00:00   A             null          null          
2018-02-03 12:01:00   A             A1            null          
2018-02-03 12:02:00   A             A1            null          
2018-02-03 12:03:00   A             A1            null          
2018-02-03 12:04:00   A             A2            null           
2018-02-03 12:05:00   A             A2            null         
2018-02-03 12:06:00   B             null          null         
2018-02-03 12:07:00   B             null          B1         
2018-02-03 12:08:00   B             null          B1            

i found a AGGREGATE function GapFill() from this Using the function, I can get the table from

select t1.ts, t1.item, t2.infoA, t3.infoB 
from 
(select ts,gapfill(item) OVER (ORDER BY ts)) t1 
LEFT JOIN (select ts,gapfill(infoA) OVER (ORDER BY ts) as infoA) on (t1.ts = t2.ts and t1.item='A') t2 
LEFT JOIN (select ts,gapfill(infoB) OVER (ORDER BY ts) as infoB) on (t1.ts = t3.ts and t1.item='B') t3

how can the query be simplified in case i have many column.

Upvotes: 0

Views: 225

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

What you want is the ignore nulls option on lag(). But Postgres does not support it (yet).

Perhaps the simplest way is a correlated subquery:

select t.ts,
       coalesce(item,
                (select t2.item
                 from t t2
                 where t2.ts < t.ts and t2.item is not null
                 order by t2.ts desc
                 fetch first 1 row only
                )
               ) as item,
       coalesce(itemA,
                (select t2.itemA
                 from t t2
                 where t2.ts < t.ts and t2.itemA is not null
                 order by t2.ts desc
                 fetch first 1 row only
                )
               ) as itemA,
       coalesce(itemB,
                (select t2.itemB
                 from t t2
                 where t2.ts < t.ts and t2.itemB is not null
                 order by t2.ts desc
                 fetch first 1 row only
                )
               ) as itemB
from t;

If you know the values are increasing or decreasing monotonically, you can use max() or min().

Another method using window functions uses the same idea. Identify the groups of rows with the same value by doing a cumulative count. Then spread the value over the rows:

select t.ts,
           max(item) over (partition by grp_item) as item,
           max(itemA) over (partition by grp_item) as itemA,
           max(itemB) over (partition by grp_item) as itemB
from (select t.*,
                    count(item) over (order by ts) as grp_item,
                    count(itemA) over (order by ts) as grp_itemA,
                    count(itemB) over (order by ts) as grp_itemB
         from t
       ) t;

Upvotes: 1

Related Questions