type_outcast
type_outcast

Reputation: 635

INSERT interpolated rows into existing table

I have a MySQL table similar to this simplified example:

          orders table
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000000     100     500 00:01:00
1000000     100     200 01:10:00
1000000     100     200 03:20:00
1000000     100     100 04:30:00
1000000     100      50 11:30:00
:
1000010     100     100 00:01:00
1000010     100     100 01:10:00
1000010     100      20 03:20:00
:
1000020     200    1000 03:20:00
1000020     200     995 08:20:00
1000020     200     995 11:50:00
--------------------------------

The table comes from a 3rd party, weighs in at some 80-100M rows daily, and the format is fixed. It would be good, except it lacks rows showing when rem_qty reaches zero. The good news is, I can estimate them, at least a good upper/lower bound:

The 3rd party scans each distinct stockid at essentially random times throughout the day, and returns one row for each open orderid at that time. For example, stockid = 100 was scanned at (00:01, 01:10, 03:20, 04:30, 11:30). At each time, there will be a row for every current orderid with that stockid. Hence, one can see that orderid = 1000000 was still open at 11:30 (the last scan in our data), but sometime between 03:20 and 04:30, orderid = 1000010 sold out. (The times for stockid = 200 have no bearing on stockid = 100).

So, what I would like to do is INSERT the interpolated rows with rem_qty = 0 for each sold-out order. In this case, we can (only) say that orderid = 1000010 went to 0 at AVG('03:20:00','04:30:00'), so I would like to INSERT the following row:

      orders table INSERT
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000010     100       0 03:55:00
--------------------------------

Trouble is, my SQL is rusty and I've not been able to figure out this complex query. Among other failed attempts, I've tried various JOINs, made a TEMPORARY TABLE stock_report(stockid,last_report), and I can do something like this:

SELECT      orders.stockid,
            orderid,
            MAX(reported),
            TIMEDIFF(last_report,MAX(reported)) as timediff
FROM        orders
INNER JOIN  stock_report
        ON  orders.stockid = stock_report.stockid
GROUP BY    orderid
HAVING      timediff > 0
ORDER BY    orderid

This would show every sold-out order, along with the HH:MM:SS difference between the last time the orderid was reported, and the last time its stockid was reported. It's maybe a good start, but instead of last_report, I need to be able to calculate a next_report column (specific to this orderid, which would basically be:

SELECT      MIN(reported) AS next_report
FROM        orders
WHERE       reported > @order_max_reported
ORDER BY    reported
LIMIT       1

But that's just a vain attempt to illustrate part of what I'm after. Again, what I really need is a way to INSERT new rows into the orders() table at the AVG() time the order's rem_qty went to 0, as in the orders table INSERT example table, above. Or, maybe the 64,000 GFLOP question: would I be better off moving this logic to my main (application) language? I'm working with 100 million rows/day, so efficiency is a concern.

Apologies for the lengthy description. This really is the best I could do to edit for conciseness! Can anyone offer any helpful suggestions?

Upvotes: 2

Views: 58

Answers (1)

Kickstart
Kickstart

Reputation: 21533

Possible to do. Have a sub query that gets the max reported time for each order id / stock id and join that against the orders table where the stock id is the same and the latest time is less that the order time. This gets you all the report times for that stock id that are greater than the latest time for that stock id and order id.

Use MIN to get the lowest reported time. Convert the 2 times to seconds, add them together and divide by 2, then convert back from seconds to a time.

Something like this:-

SELECT orderid, stockid, 0, SEC_TO_TIME((TIME_TO_SEC(next_poss_order_report) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
    SELECT a.orderid, a.stockid, last_order_report, MIN(b.reported) next_poss_order_report
    FROM 
    (
        SELECT orderid, stockid, MAX(reported) last_order_report
        FROM orders_table
        GROUP BY orderid, stockid
    ) a
    INNER JOIN orders_table b
    ON a.stockid = b.stockid
    AND a.last_order_report < b.reported
    GROUP BY a.orderid, a.stockid, a.last_order_report
) sub0;

SQL fiddle here:-

http://www.sqlfiddle.com/#!2/cf129/17

Possible to simplify this a bit to:-

SELECT a.orderid, a.stockid, 0, SEC_TO_TIME((TIME_TO_SEC(MIN(b.reported)) + TIME_TO_SEC(last_order_report)) / 2)
FROM 
(
    SELECT orderid, stockid, MAX(reported) last_order_report
    FROM orders_table
    GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report;

These queries might take a while, but are probably more efficient than running many queries from scripted code.

Upvotes: 1

Related Questions