Reputation: 635
I have a MySQL table similar to this simplified example:
orders table
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000000 100 500 00:01:00
1000000 100 200 01:10:00
1000000 100 200 03:20:00
1000000 100 100 04:30:00
1000000 100 50 11:30:00
:
1000010 100 100 00:01:00
1000010 100 100 01:10:00
1000010 100 20 03:20:00
:
1000020 200 1000 03:20:00
1000020 200 995 08:20:00
1000020 200 995 11:50:00
--------------------------------
The table comes from a 3rd party, weighs in at some 80-100M rows daily, and the format is fixed. It would be good, except it lacks rows showing when rem_qty reaches zero. The good news is, I can estimate them, at least a good upper/lower bound:
The 3rd party scans each distinct stockid at essentially random times throughout the day, and returns one row for each open orderid at that time. For example, stockid = 100 was scanned at (00:01, 01:10, 03:20, 04:30, 11:30)
. At each time, there will be a row for every current orderid with that stockid. Hence, one can see that orderid = 1000000 was still open at 11:30 (the last scan in our data), but sometime between 03:20 and 04:30, orderid = 1000010 sold out. (The times for stockid = 200 have no bearing on stockid = 100).
So, what I would like to do is INSERT
the interpolated rows with rem_qty = 0 for each sold-out order. In this case, we can (only) say that orderid = 1000010 went to 0 at AVG('03:20:00','04:30:00')
, so I would like to INSERT
the following row:
orders table INSERT
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000010 100 0 03:55:00
--------------------------------
Trouble is, my SQL is rusty and I've not been able to figure out this complex query. Among other failed attempts, I've tried various JOIN
s, made a TEMPORARY TABLE stock_report(stockid,last_report)
, and I can do something like this:
SELECT orders.stockid,
orderid,
MAX(reported),
TIMEDIFF(last_report,MAX(reported)) as timediff
FROM orders
INNER JOIN stock_report
ON orders.stockid = stock_report.stockid
GROUP BY orderid
HAVING timediff > 0
ORDER BY orderid
This would show every sold-out order, along with the HH:MM:SS difference between the last time the orderid was reported, and the last time its stockid was reported. It's maybe a good start, but instead of last_report, I need to be able to calculate a next_report column (specific to this orderid, which would basically be:
SELECT MIN(reported) AS next_report
FROM orders
WHERE reported > @order_max_reported
ORDER BY reported
LIMIT 1
But that's just a vain attempt to illustrate part of what I'm after. Again, what I really need is a way to INSERT
new rows into the orders() table at the AVG()
time the order's rem_qty went to 0, as in the orders table INSERT
example table, above. Or, maybe the 64,000 GFLOP question: would I be better off moving this logic to my main (application) language? I'm working with 100 million rows/day, so efficiency is a concern.
Apologies for the lengthy description. This really is the best I could do to edit for conciseness! Can anyone offer any helpful suggestions?
Upvotes: 2
Views: 58
Reputation: 21533
Possible to do. Have a sub query that gets the max reported time for each order id / stock id and join that against the orders table where the stock id is the same and the latest time is less that the order time. This gets you all the report times for that stock id that are greater than the latest time for that stock id and order id.
Use MIN to get the lowest reported time. Convert the 2 times to seconds, add them together and divide by 2, then convert back from seconds to a time.
Something like this:-
SELECT orderid, stockid, 0, SEC_TO_TIME((TIME_TO_SEC(next_poss_order_report) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
SELECT a.orderid, a.stockid, last_order_report, MIN(b.reported) next_poss_order_report
FROM
(
SELECT orderid, stockid, MAX(reported) last_order_report
FROM orders_table
GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report
) sub0;
SQL fiddle here:-
http://www.sqlfiddle.com/#!2/cf129/17
Possible to simplify this a bit to:-
SELECT a.orderid, a.stockid, 0, SEC_TO_TIME((TIME_TO_SEC(MIN(b.reported)) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
SELECT orderid, stockid, MAX(reported) last_order_report
FROM orders_table
GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report;
These queries might take a while, but are probably more efficient than running many queries from scripted code.
Upvotes: 1