Adria Ciurana
Adria Ciurana

Reputation: 934

Custom function inside order by

I'm trying to make a pretty complex query. I have a database with blocks. Each block has a start date, an end date and the module to which it belongs. I have to calculate the turnover, which would be the difference between consecutive blocks (for the block[i]):

block[i].start - block[i - 1].end

Let's put the following example, I have these data:

create table blocks (start datetime, end datetime, module integer);
insert into blocks (start, end, module)
values 
('2016-04-13 09:00:00',  '2016-04-13 10:00:00', 1), -- diff: null or 0
('2016-04-13 11:00:00',  '2016-04-13 12:00:00', 1), -- diff: 1hour
('2016-04-13 12:30:00',  '2016-04-13 14:00:00', 1), -- diff: 30minutes
                                                    -- turnoverAvg: 45min = (1h + 30min) / 2

('2016-04-13 09:00:00',  '2016-04-13 10:00:00', 2), -- diff: null or 0
('2016-04-13 12:00:00',  '2016-04-13 12:30:00', 2), -- diff: 2hour
('2016-04-13 13:30:00',  '2016-04-13 14:30:00', 2), -- diff: 1hour
                                                    -- turnoverAvg: 90min = (2h + 1h) / 2

('2016-04-14 14:30:00',  '2016-04-14 16:00:00', 2), -- diff: null or 0
('2016-04-14 17:00:00',  '2016-04-14 18:00:00', 2), -- diff: 1hour
                                                    -- turnoverAvg: 60min = 1h/1

('2016-04-13 09:00:00',  '2016-04-13 10:00:00', 3), -- diff: null or 0
('2016-04-13 10:00:00',  '2016-04-13 11:00:00', 3), -- diff: 0
('2016-04-13 12:00:00',  '2016-04-13 13:00:00', 3), -- diff: 1hour
('2016-04-13 14:00:00',  '2016-04-13 15:00:00', 3), -- diff: 1hour
('2016-04-13 16:00:00',  '2016-04-13 17:00:00', 3), -- diff: 1hour
                                                    -- turnoverAvg: 45min = (0 + 1h + 1h + 1h) / 4

('2016-04-13 09:00:00',  '2016-04-13 10:00:00', 4), -- diff: null or 0
                                                    -- turnoverAvg: null

('2016-04-13 09:00:00',  '2016-04-13 15:00:00', 5), -- diff: null or 0
('2016-04-13 19:00:00',  '2016-04-13 20:00:00', 5); -- diff: 4hour
                                                    -- turnoverAvg: 240min = 4h/1

I should make the following query (pseudo-code):

SELECT turnoverAVG (rows of each group by)
FROM blocks
GROUP BY DATE (start), module

Where turnoverAvg would be a function like this (pseudo-code):

function turnoverAVG(rows):
  acc = 0.0
  for(i=1; i < rows.length; i++)
    d = row[i].start - rows[i - 1].end
    acc += d
  return acc/(rows.length - 1)

Actually I have tried many things, but I do not know where to start ... If someone has an idea, I would greatly appreciate it.

EDIT:

The output would be similar to:

turnoverAVG, module, day
45min, 1, 2016-04-13
1:30hour, 2, 2016-04-13
1hour, 2, 2016-04-14 -- different day but same module
45min, 3, 2016-04-13
4hour, 5, 2016-04-13

The turnoverAVG would be fine if it was in minutes, but I've written it that way to make it better understood. As you can see it never computes the first block because it can not be subtracted with the previous one (there is no previous block).

Upvotes: 2

Views: 70

Answers (1)

Solarflare
Solarflare

Reputation: 11116

Functions like this are called window functions. They are only available starting with MySQL 8.

Until then, you will have to find an alternative way to write do your query, see e.g. this question. Most times, you will do it by using variables, although the sql way is to use joins.

But in your specific case, you do not actually need these: the turnovertime is not only the sum between the modules (for which you need to know the previous row), but also the time between start and end of your day (for which you only need min and max) minus the time the modules where running (for which you do not need the previous row).

So try this:

select 
  module,
  date(start),
  case when count(module) > 1
    then (TIMESTAMPDIFF(Minute,min(start),max(end)) -
           sum(TIMESTAMPDIFF(Minute,start, end))) 
         / (count(module) - 1)
    else null 
  end as turnoverAVG,
  -- details, just for information:
  TIMESTAMPDIFF(Minute,min(start),max(end)) as total_day,
  sum(TIMESTAMPDIFF(Minute,start, end)) as module_duration,
  TIMESTAMPDIFF(Minute,min(start),max(end)) -
    sum(TIMESTAMPDIFF(Minute,start, end)) as turnover,
  count(module) as cnt
from blocks
group by date(start), module;

The 4 additional columns are just there to so you can see the different termsn used in the calculation, and you can remove them.

All modules are required to start and end on the same date (although you can simply modify it to support overnight modules). It also does not correct the times if the modules overlap (but so doesn't your pseudocode).

It's not entirely clear if you want to include days with only one module (as suggested in the comment for module 4) or not (as suggested in your sample output). If you want to exclude these, you can add e.g. having count(module) > 1 at the end of the query.

Upvotes: 1

Related Questions