Reputation: 11
I have a table like this
ts item infoA infoB
2018-02-03 12:00:00 A null null
2018-02-03 12:01:00 null A1 null
2018-02-03 12:02:00 A null null
2018-02-03 12:03:00 null null null
2018-02-03 12:04:00 null A2 null
2018-02-03 12:05:00 null null null
2018-02-03 12:06:00 B null null
2018-02-03 12:07:00 null null B1
2018-02-03 12:08:00 null null null
I want to fill the null data with time series only on related item
ts item infoA infoB
2018-02-03 12:00:00 A null null
2018-02-03 12:01:00 A A1 null
2018-02-03 12:02:00 A A1 null
2018-02-03 12:03:00 A A1 null
2018-02-03 12:04:00 A A2 null
2018-02-03 12:05:00 A A2 null
2018-02-03 12:06:00 B null null
2018-02-03 12:07:00 B null B1
2018-02-03 12:08:00 B null B1
i found a AGGREGATE function GapFill() from this Using the function, I can get the table from
select t1.ts, t1.item, t2.infoA, t3.infoB
from
(select ts,gapfill(item) OVER (ORDER BY ts)) t1
LEFT JOIN (select ts,gapfill(infoA) OVER (ORDER BY ts) as infoA) on (t1.ts = t2.ts and t1.item='A') t2
LEFT JOIN (select ts,gapfill(infoB) OVER (ORDER BY ts) as infoB) on (t1.ts = t3.ts and t1.item='B') t3
how can the query be simplified in case i have many column.
Upvotes: 0
Views: 225
Reputation: 1269953
What you want is the ignore nulls
option on lag()
. But Postgres does not support it (yet).
Perhaps the simplest way is a correlated subquery:
select t.ts,
coalesce(item,
(select t2.item
from t t2
where t2.ts < t.ts and t2.item is not null
order by t2.ts desc
fetch first 1 row only
)
) as item,
coalesce(itemA,
(select t2.itemA
from t t2
where t2.ts < t.ts and t2.itemA is not null
order by t2.ts desc
fetch first 1 row only
)
) as itemA,
coalesce(itemB,
(select t2.itemB
from t t2
where t2.ts < t.ts and t2.itemB is not null
order by t2.ts desc
fetch first 1 row only
)
) as itemB
from t;
If you know the values are increasing or decreasing monotonically, you can use max()
or min()
.
Another method using window functions uses the same idea. Identify the groups of rows with the same value by doing a cumulative count. Then spread the value over the rows:
select t.ts,
max(item) over (partition by grp_item) as item,
max(itemA) over (partition by grp_item) as itemA,
max(itemB) over (partition by grp_item) as itemB
from (select t.*,
count(item) over (order by ts) as grp_item,
count(itemA) over (order by ts) as grp_itemA,
count(itemB) over (order by ts) as grp_itemB
from t
) t;
Upvotes: 1