Reputation: 51
I have a table with indexes of imported files, with dates and branches of each imported files.
Now I need to do a consolidation of multiple branches, so, that I have to duplicate the information from some branches when we have holidays, so that the data is consistent, basically I need to fill these gaps with the latest available information.
I tried doing some self-joins with ranking in order to shift between them and get the previous data, but it didn't work.
What I have is a table:
rundate, branch, imported
2015-04-01, PL1, TRUE
2015-04-01, ES1, TRUE
2015-04-01, CZ4, TRUE
2015-04-02, PL1, TRUE
2015-04-02, ES1, TRUE
2015-04-02, CZ4, TRUE
2015-04-03, ES1, TRUE
2015-04-03, CZ4, TRUE
In this example, I would like to make a query that returns:
gap_date, branch, real_date
2015-04-03, PL1, 2015-04-02
This table is quite small (couple thousand lines), so, performance shouldn't be a big issue.
Any idea on how can I achieve that?
Now I am using a function that receives the rundate and branch of the gap dates as parameters, and answers the latest before the date passed as parameter (using max(rundate) where rundate <= '$1')
Thanks!
Upvotes: 0
Views: 254
Reputation: 4523
This is the carthesian product of both domains
solution from @voycheck except the records that actually exist.
WITH br AS (
SELECT DISTINCT branch AS branch FROM tbl
)
, mima AS (
SELECT MIN(rundate) as mi
, MAX(rundate) as ma
FROM tbl)
, rng AS (
SELECT generate_series( mima.mi, mima.ma, '1 day'::interval)::date AS rundate
FROM mima
)
SELECT * FROM rng
JOIN br ON NOT EXISTS ( -- cartesian product EXCEPT
SELECT *
FROM tbl t
WHERE t.branch = br.branch
AND t.rundate = rng.rundate
)
;
Upvotes: 0
Reputation: 7890
you can use outer join
, subquery
and cross join
:
Schema:
create table tbl(rundate date,
branch varchar(10),
imported bool);
insert into tbl values('2015-04-01', 'PL1', TRUE),
('2015-04-01', 'ES1', TRUE),
('2015-04-01', 'CZ4', TRUE),
('2015-04-02', 'PL1', TRUE),
('2015-04-02', 'ES1', TRUE),
('2015-04-02', 'CZ4', TRUE),
('2015-04-03', 'ES1', TRUE),
('2015-04-03', 'CZ4', TRUE);
Query:
select q.rundate as gap_date,q.branch,
(select max(tt.rundate)
from tbl tt
where tt.rundate<q.rundate and tt.branch=q.branch)
as real_date
from tbl t
right outer join(
select rundate,branch from (
select distinct rundate from tbl) t1
cross join (
select distinct branch from tbl)t2
)q
on t.rundate=q.rundate and t.branch=q.branch
where t.branch is null
Result:
gap_date branch real_date
2015-04-03 PL1 2015-04-02
Upvotes: 1
Reputation: 1118
You have to select all unique dates in one set, all unique branches in another - make a cartesian product of it - and then you can check for which combination you have a gap... this is what I mean:
CREATE TEMPORARY TABLE _matrix
SELECT
t1.rundate,
t2.branch
(SELECT DISTINCT rundate FROM yourtable) t1,
(SELECT DISTINCT branch FROM yourtable) t2
Then you can find gaps using LEFT JOIN and "WHERE ... IS NULL" statement:
SELECT
m.rundate,
m.branch
FROM _matrix m
LEFT JOIN yourtable yt ON(yt.rundate = m.rundate AND yt.branch = m.branch)
WHERE yt.rundate IS NULL
Of course the same can be achieved without temporary tables - using just a subquery. Especially when the performance is not crucial.
Upvotes: 0