Tom Johnson
Tom Johnson

Reputation: 13

SQL - Grab record with minimum date until cost changes

Got a record set where I'm looking to grab the record with the minimum date based on PO until the cost changes. Any help would be appreciated. There are around 7 million records and the current cursor just isn't cutting it.

Sample:

PO       log_Ts                         cost
123      2012-06-26-10.37.44.035385     2.5896
123      2012-06-27-02.16.14.706817     2.5896
123      2012-06-26-10.28.57.540731     2.591
123      2012-06-26-10.37.43.948940     2.5896
123      2012-06-26-10.37.43.421713     2.5896
123      2012-06-26-18.34.37.191917     2.5896
123      2012-06-27-02.16.14.705622     2.5896
123      2012-06-27-04.33.18.264742     2.5896
123      2012-06-26-10.37.44.007667     2.5896
123      2012-06-26-10.37.43.706207     2.5896
123      2012-06-26-10.26.56.767121     2.5896
123      2012-06-26-10.37.43.919248     2.5896

Looking to grab:
PO       log_Ts                         cost
123      2012-06-26-10.26.56.767121     2.5896
123      2012-06-26-10.28.57.540731     2.591
123      2012-06-26-10.37.43.421713     2.5896

Upvotes: 1

Views: 197

Answers (4)

Fred Sobotka
Fred Sobotka

Reputation: 5332

If your DBMS supports the LAG() function, it's possible to use the CTE approach in t-clausen.dk's and Vikdor's answers, but without needing a self join.

WITH t 
AS 
(
    SELECT PO, log_Ts, Cost, 
    LAG( Cost ) OVER( PARTITION BY PO ORDER BY log_Ts) AS prevcost
    FROM po_log_events
) 
SELECT PO, log_Ts, Cost
FROM t
WHERE prevcost IS NULL 
OR prevcost <> cost       
ORDER BY PO, log_Ts

If ROW_NUMBER() and common tables expressions aren't available for your database (i.e., your database is MySQL), the same result can be achieved through correlated subqueries:

SELECT DISTINCT p.PO, p.log_Ts, p.Cost
FROM po_log_events p
WHERE NOT EXISTS 
    ( SELECT 1 FROM po_log_events p2 
      WHERE p2.PO = p.PO AND p2.log_Ts < p.log_Ts )
OR NOT EXISTS 
    ( SELECT 1 FROM po_log_events p3 
      WHERE p3.PO = p.PO 
      AND p3.log_Ts = 
          (SELECT MAX(p4.log_ts) 
           FROM po_log_events p4
           WHERE p4.PO = p.PO
           AND p4.log_Ts < p.log_Ts
           )
      AND p3.Cost = p.Cost
     )

The DISTINCT can be removed if the table has a unique index on (PO, log_Ts)

Upvotes: 0

t-clausen.dk
t-clausen.dk

Reputation: 44326

;with a as
(
  select po, log_ts, cost, row_number() over (partition by po order by log_ts) rn
  from <table>
), b as
(
  select po, log_ts, cost, 1 grp, rn from a where rn = 1
  union all
  select a.po, a.log_ts, a.cost, case when a.cost = b.cost then b.grp else b.grp+1 end, a.rn
  from a
  join b on a.rn = b.rn+1 and a.po = b.po
)
select po, min(log_ts) log_ts, cost 
from b
group by po, cost, grp

Upvotes: 0

Vikdor
Vikdor

Reputation: 24134

You can try the following:

  1. SELECT the rows in the order of log_ts and associate a row number with each row.
  2. SELF JOIN the result in #1 on firstTable.rownum = secondTable.rownum -1. Each row in the worktable will contain the current and next record.
  3. Apply the WHERE clause such that the cost value is different and you are interested in the tuple from the second set in the SELF JOIN.

E.g.

RowNum  PO       log_Ts                         cost    RowNum  PO       log_Ts                         cost
1       123      2012-06-26-10.37.44.035385     2.5896  2       123      2012-06-27-02.16.14.706817     2.5896
2       123      2012-06-27-02.16.14.706817     2.5896  3       123      2012-06-26 10.28.57.540731     2.591
3       123      2012-06-26-10.28.57.540731     2.591   4       123      2012-06-26-10.37.43.948940     2.5896

Query:

WITH T (PO, Log_Ts, Cost, RowNum) 
AS 
(
SELECT PO, Log_Ts, Cost, Row_Number() OVER(ORDER BY Log_Ts) FROM PO_INFO
) 

SELECT T2.* 
FROM 
    T t1
    JOIN T t2
    ON t1.rownum = t2.rownum - 1 AND t1.cost != t2.cost

Result:

123      2012-06-26 10.28.57.540731     2.591    2
123      2012-06-26-10.37.43.948940     2.5896   3

HTH.

Upvotes: 1

Dylan Smith
Dylan Smith

Reputation: 22245

This is something that really lends itself to procedural logic (code) rather than set-based logic (sql). As such I would suggest doing it in code if possible where you can simply sort the data-set and loop through it.

If code is not an option you can do the same thing in a stored procedure using a cursor and a loop.

Upvotes: 0

Related Questions