Reputation: 91
I am looking to join three tables together and fill forward null values on the resulting table. Three tables:
Table 1 (raw.fb_historical_data) - this is the main table on which I would like to join the other two on to. Each row of this table is related to one or more rows in the other two tables through a combination of columns id, clk and timestamp (mkt_id and row_id in the other tables).
+---------------------+-----+-----+--------------+
| timestamp | clk | id | some_columns |
+---------------------+-----+-----+--------------+
| 2016-06-19 06:11:13 | 123 | 126 | a |
| 2016-06-19 06:16:13 | 124 | 127 | b |
| 2016-06-19 06:21:13 | 234 | 126 | c |
| 2016-06-19 06:41:13 | 456 | 127 | d |
| ... | ... | ... | ... |
+---------------------+-----+-----+--------------+
Table 2 (raw.fb_runner_changes) - this table essentially gives price changes for a wide range of different markets
+---------------------+--------+--------+-------+
| timestamp | row_id | mkt_id | price |
+---------------------+--------+--------+-------+
| 2016-06-19 06:11:13 | 123 | 126 | 1 |
| 2016-06-19 06:21:13 | 123 | 126 | 2 |
| 2016-06-19 06:41:13 | 123 | 126 | 3 |
| 2016-06-06 18:54:06 | 124 | 127 | 1 |
| 2016-06-06 18:56:06 | 124 | 127 | 2 |
| 2016-06-06 18:57:06 | 124 | 127 | 3 |
| ... | ... | ... | ... |
+---------------------+--------+--------+-------+
Table 3 (raw.fb_runners) - a table with extra information about market changes that I would like to join
+---------------------+--------+--------+---------------+
| timestamp | row_id | mkt_id | other_columns |
+---------------------+--------+--------+---------------+
| 2016-06-19 06:15:13 | 234 | 126 | ab |
| 2016-06-19 06:31:13 | 234 | 126 | cd |
| 2016-06-19 06:56:13 | 234 | 126 | ef |
| 2016-06-06 18:54:06 | 456 | 127 | gh |
| 2016-06-06 18:56:06 | 456 | 127 | jk |
| 2016-06-06 18:57:06 | 456 | 127 | lm |
| ... | ... | ... | ... |
+---------------------+--------+--------+---------------+
Essentially what I want to do is fill NULL information forward (ordered by timestamp) while grouping by market id.
So far, I have tried to join the tables together using
SELECT *
FROM raw.fb_historical_data AS h
LEFT JOIN raw.fb_runner_changes AS rc
ON rc.row_id = h.clk
AND rc.timestamp = h.timestamp
AND rc.mkt_id = h.id
LEFT JOIN raw.fb_runners AS r
ON r.row_id = h.clk
AND r.timestamp = h.timestamp
AND r.mkt_id = h.id
Which has worked as intended, though now there are nulls in the resulting dataset which i'd like to fill in with the last available value for that market.
Upvotes: 1
Views: 1698
Reputation: 1161
This seems to correctly do 'forward fill' in postgres. However I am a postgres newbie so I would appreciate feedback if it's wrong.
DROP TABLE IF EXISTS example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', null),
(1, null, 1),
(2, 'b', 2),
(2,null ,null );
select * from example
select id, (case
when str is null
then lag(str,1) over (order by id)
else str
end) as str,
(case
when val is null
then lag(val,1) over (order by id)
else val
end) as val
from example
Upvotes: 0
Reputation: 44961
With some of the other SQL dialects, fill forward could be done using the window function last_value
in combination with the instruction ignore nulls
.
Since this is not supported in PostgreSQL (check the note at the bottom of this page), we are using a 2 steps work-around.
select ts, val, val_seq, min(val) over (partition by val_seq) val_fill_fw
from (select ts, val, count(val) over(order by ts) as val_seq
from t
) t
-
+----+----------+---------+-------------+
| ts | val | val_seq | val_fill_fw |
+----+----------+---------+-------------+
| 1 | (null) | 0 | (null) |
| 2 | (null) | 0 | (null) |
| 3 | hello | 1 | hello |
| 4 | (null) | 1 | hello |
| 5 | (null) | 1 | hello |
| 6 | darkness | 2 | darkness |
| 7 | my | 3 | my |
| 8 | (null) | 3 | my |
| 9 | old | 4 | old |
| 10 | (null) | 4 | old |
| 11 | (null) | 4 | old |
| 12 | (null) | 4 | old |
| 13 | friend | 5 | friend |
| 14 | (null) | 5 | friend |
+----+----------+---------+-------------+
Upvotes: 4