screechOwl
screechOwl

Reputation: 28159

postgresql join on max(date) and another index

I'm trying to join 2 tables to find the days before the next delivery date.

One table is part number and restock date

part    restock_date
123 01/01/15
123 04/01/15
123 06/30/15
123 09/28/15
234 01/25/15
234 04/25/15
234 07/24/15
234 10/22/15

the other table is part number and analysis date

part    analysis_date
123 02/12/15
123 03/29/15
123 05/13/15
123 06/27/15
234 03/23/15
234 05/07/15
234 06/21/15
234 08/05/15

I'm trying to create a join that shows the next restock date for a given anaylysis date so I can calculate a diff in days between the analysis date and the next restock date.

part    restock_date    analysis_date   days_to_restock
123 04/01/15    02/12/15    48
123 04/01/15    03/29/15    3
123 06/30/15    05/13/15    48
123 06/30/15    06/27/15    3
234 04/25/15    03/23/15    33
234 07/24/15    05/07/15    78
234 07/24/15    06/21/15    33
234 10/22/15    08/05/15    78

Any suggestions?

Upvotes: 1

Views: 1307

Answers (3)

joop
joop

Reputation: 4503

Naive way to find consecutive observations:

SELECT a.*, r.*
        , (r.restock_date - a.analysis_date ) AS nday
FROM tbl_analysis a
LEFT JOIN tbl_restock r
        ON a.part = r.part
        AND r.restock_date >= a.analysis_date           -- restock *after* analysys
        AND NOT EXISTS ( SELECT * FROM tbl_restock nx   -- ,but no observations inbetween 
                WHERE nx.part = a.part
                AND nx.restock_date > a.analysis_date
                AND nx.restock_date < r.restock_date
                )
ORDER BY a.part,a.analysis_date
        ;

Upvotes: 0

Nick
Nick

Reputation: 7451

This assumes that you only care about cases where there has been a restock for the part, which is why this is an inner join. If you want to handle separately those without restocks yet, you will need to make it a left join and add appropriate logic.

Table setup

DROP TABLE IF EXISTS tbl_restock;
CREATE TABLE tbl_restock (part TEXT, restock_date DATE);
INSERT INTO tbl_restock
VALUES
('123','01/01/15'),
('123','04/01/15'),
('123','06/30/15'),
('123','09/28/15'),
('234','01/25/15'),
('234','04/25/15'),
('234','07/24/15'),
('234','10/22/15');

DROP TABLE IF EXISTS tbl_analysis;
CREATE TABLE tbl_analysis (part TEXT, analysis_date DATE);
INSERT INTO tbl_analysis
VALUES
('123','02/12/15'),
('123','03/29/15'),
('123','05/13/15'),
('123','06/27/15'),
('234','03/23/15'),
('234','05/07/15'),
('234','06/21/15'),
('234','08/05/15');

Solution 1

WITH cte AS
(
  SELECT
    ta.part,
    tr.restock_date,
    ta.analysis_date,
    EXTRACT(DAY FROM AGE(tr.restock_date,ta.analysis_date)) AS days_to_restock,
    RANK() OVER (PARTITION BY ta.part, ta.analysis_date ORDER BY AGE(tr.restock_date,ta.analysis_date) ASC) AS dtr_rank
  FROM
    tbl_analysis ta
  INNER JOIN
    tbl_restock tr
    ON  (tr.part = ta.part)
    AND  (tr.restock_date >= ta.analysis_date)
)

SELECT
  part,
  restock_date,
  analysis_date,
  days_to_restock
FROM
  cte
WHERE
  dtr_rank = 1;

UPDATE - Solution 2

Here is another solution using the DISTINCT ON approach inspired by @GordonLinoff's answer here: Multiple unwanted records in Group by clause in Postgress

SELECT
  DISTINCT ON (ta.part, ta.analysis_date)
  ta.part,
  tr.restock_date,
  ta.analysis_date,
  (tr.restock_date - ta.analysis_date) AS days_to_restock
FROM
  tbl_analysis ta
LEFT JOIN
  tbl_restock tr
  ON  (tr.part = ta.part)
  AND (tr.restock_date >= ta.analysis_date)
ORDER BY
  part,
  analysis_date,
  restock_date;

Output

enter image description here

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269973

You could approach this with a lateral join:

select t2.*, t1.restock_date,
       (t1.restock_date - t2.analysis_date) as diff
from table2 t2 left join lateral
     (select t1.*
      from table1 t1
      where t2.part = t1.part and t2.analysis_date <= t1.restock_date
      order by t1.restock_date
      limit 1
     ) t1;

Upvotes: 1

Related Questions