Reputation: 13
Got a record set where I'm looking to grab the record with the minimum date based on PO until the cost changes. Any help would be appreciated. There are around 7 million records and the current cursor just isn't cutting it.
Sample:
PO log_Ts cost
123 2012-06-26-10.37.44.035385 2.5896
123 2012-06-27-02.16.14.706817 2.5896
123 2012-06-26-10.28.57.540731 2.591
123 2012-06-26-10.37.43.948940 2.5896
123 2012-06-26-10.37.43.421713 2.5896
123 2012-06-26-18.34.37.191917 2.5896
123 2012-06-27-02.16.14.705622 2.5896
123 2012-06-27-04.33.18.264742 2.5896
123 2012-06-26-10.37.44.007667 2.5896
123 2012-06-26-10.37.43.706207 2.5896
123 2012-06-26-10.26.56.767121 2.5896
123 2012-06-26-10.37.43.919248 2.5896
Looking to grab:
PO log_Ts cost
123 2012-06-26-10.26.56.767121 2.5896
123 2012-06-26-10.28.57.540731 2.591
123 2012-06-26-10.37.43.421713 2.5896
Upvotes: 1
Views: 197
Reputation: 5332
If your DBMS supports the LAG() function, it's possible to use the CTE approach in t-clausen.dk's and Vikdor's answers, but without needing a self join.
WITH t
AS
(
SELECT PO, log_Ts, Cost,
LAG( Cost ) OVER( PARTITION BY PO ORDER BY log_Ts) AS prevcost
FROM po_log_events
)
SELECT PO, log_Ts, Cost
FROM t
WHERE prevcost IS NULL
OR prevcost <> cost
ORDER BY PO, log_Ts
If ROW_NUMBER() and common tables expressions aren't available for your database (i.e., your database is MySQL), the same result can be achieved through correlated subqueries:
SELECT DISTINCT p.PO, p.log_Ts, p.Cost
FROM po_log_events p
WHERE NOT EXISTS
( SELECT 1 FROM po_log_events p2
WHERE p2.PO = p.PO AND p2.log_Ts < p.log_Ts )
OR NOT EXISTS
( SELECT 1 FROM po_log_events p3
WHERE p3.PO = p.PO
AND p3.log_Ts =
(SELECT MAX(p4.log_ts)
FROM po_log_events p4
WHERE p4.PO = p.PO
AND p4.log_Ts < p.log_Ts
)
AND p3.Cost = p.Cost
)
The DISTINCT
can be removed if the table has a unique index on (PO, log_Ts)
Upvotes: 0
Reputation: 44326
;with a as
(
select po, log_ts, cost, row_number() over (partition by po order by log_ts) rn
from <table>
), b as
(
select po, log_ts, cost, 1 grp, rn from a where rn = 1
union all
select a.po, a.log_ts, a.cost, case when a.cost = b.cost then b.grp else b.grp+1 end, a.rn
from a
join b on a.rn = b.rn+1 and a.po = b.po
)
select po, min(log_ts) log_ts, cost
from b
group by po, cost, grp
Upvotes: 0
Reputation: 24134
You can try the following:
SELECT
the rows in the order of log_ts and associate a row number with each row.SELF JOIN
the result in #1 on firstTable.rownum = secondTable.rownum -1. Each row in the worktable will contain the current and next record.WHERE
clause such that the cost value is different and you are interested in the tuple from the second set in the SELF JOIN
.E.g.
RowNum PO log_Ts cost RowNum PO log_Ts cost
1 123 2012-06-26-10.37.44.035385 2.5896 2 123 2012-06-27-02.16.14.706817 2.5896
2 123 2012-06-27-02.16.14.706817 2.5896 3 123 2012-06-26 10.28.57.540731 2.591
3 123 2012-06-26-10.28.57.540731 2.591 4 123 2012-06-26-10.37.43.948940 2.5896
Query:
WITH T (PO, Log_Ts, Cost, RowNum)
AS
(
SELECT PO, Log_Ts, Cost, Row_Number() OVER(ORDER BY Log_Ts) FROM PO_INFO
)
SELECT T2.*
FROM
T t1
JOIN T t2
ON t1.rownum = t2.rownum - 1 AND t1.cost != t2.cost
Result:
123 2012-06-26 10.28.57.540731 2.591 2
123 2012-06-26-10.37.43.948940 2.5896 3
HTH.
Upvotes: 1
Reputation: 22245
This is something that really lends itself to procedural logic (code) rather than set-based logic (sql). As such I would suggest doing it in code if possible where you can simply sort the data-set and loop through it.
If code is not an option you can do the same thing in a stored procedure using a cursor and a loop.
Upvotes: 0