Reputation: 2249
I have two tables: sales, actions
Sales table:
id, datetime, status
--------------------
Actions table:
id, datetime, sales_id, action
------------------------------
There's a many-to-one relations ship between the actions and sales tables. For each sales record, there could be numerous actions. I am trying to determine, by each hour of the day, what the average time difference is between when sales records are first created, and when the first action record associated with it's respective sales record was created.
In other words, how fast (in hours) are sales agents responding to leads, based on what hour of the day the lead came in.
Here's what I tried:
SELECT
FROM_UNIXTIME(sales.datetime, '%H') as Hour,
count(actions.id) AS actions,
(MIN(actions.datetime) - sales.datetime) / 3600 as Lag
FROM
actions
INNER JOIN sales ON actions.sales_id = sales.id
group by Hour
I get what looks like reasonable hours numbers for 'Lag', but I am not convinced they're accurate:
Hour Actions Lag
00 66 11.0442
01 30 11.2758
02 50 8.2900
03 25 5.7492
.
.
.
23 77 34.4744
My question is, is this the correct way to get the values for the first action that was recorded for a given sales record? :
(MIN(actions.createDate) - sales.createDate) / 3600 as Lag
Upvotes: 0
Views: 70
Reputation: 108796
This query has two layers, and it's helpful to crawl through them both.
The lowest layer should compute the lag time from sales.datetime
to the earliest action.datetime
for each row of sales.
That will probably use a MIN()
function.
The next layer will compute the statistics for those lag times, worked out in the lowest layer, by hour of the day. That will use an AVG()
function.
Here's the lowest layer:
SELECT s.id, s.datetime, s.status,
TIMEDIFF(SECOND, MIN(a.datetime), s.datetime) AS lag_seconds
FROM sales AS s
JOIN actions AS a ON s.id = a.sales_id AND a.datetime > s.datetime
GROUP BY s.id, s.datetime, s.status
The second part of that ON
clause makes sure that you only consider actions taken after the sales order was entered. It may be unnecessary, but I thought I'd throw it in.
Here's the second layer.
SELECT HOUR(datetime) AS hour_Sale_entered,
COUNT(*) AS number_in_that_hour,
AVG(lag_seconds) / 3600.0 AS Lag_to_first_action
FROM (
SELECT s.id, s.datetime, s.status,
TIMEDIFF(SECOND, MIN(a.datetime), s.datetime) AS lag_seconds
FROM sales AS s
JOIN actions AS a ON s.id = a.sales_id AND a.datetime > s.datetime
GROUP BY s.id, s.datetime, s.status
) AS d
GROUP BY HOUR(datetime)
ORDER BY HOUR(datetime)
See how there are two nested aggregations (GROUP BY
) operations? The inner one identifies the first action, and the second one does the hourly averaging.
One more tidbit. If you want to include sales
items that have not yet been acted on, you can do this:
SELECT HOUR(datetime) AS hour_Sale_entered,
COUNT(*) AS number_in_that_hour,
SUM(no_action) AS not_acted_upon_yet,
AVG(lag_seconds) / 3600.0 AS Lag_to_first_action
FROM (
SELECT s.id, s.datetime, s.status,
TIMEDIFF(SECOND, MIN(a.datetime), s.datetime) AS lag_seconds,
IFNULL(a.id,1,0) AS no_action
FROM sales AS s
LEFT JOIN actions AS a ON s.id = a.sales_id AND a.datetime > s.datetime
GROUP BY s.id, s.datetime, s.status
) AS d
GROUP BY HOUR(datetime)
ORDER BY HOUR(datetime)
The average of lag_seconds will still be correct, because the sales
rows with no action
rows will have NULL values for that, and AVG()
ignores nulls.
Upvotes: 0
Reputation: 781848
It should be:
MIN(actions.datetime - sales.datetime) / 3600 AS Lag
You way is getting the first action from any sale within the hour, and subtracting each sale's timestamp from its timestamp. You need to do the subtraction only within actions and sales that are joined by the ID.
Upvotes: 1