Bing
Bing

Reputation: 3171

MySQL/PHP: Search rows by datetime RANGE given overlaps in the table

I have a table that shows me when people are available to work, like the following:

+------+---------------------+---------------------+
| name | start               | end                 |
+------+---------------------+---------------------+    
| Odin | 2015-07-01 11:00:00 | 2015-07-01 11:30:00 |
| Thor | 2015-07-01 11:00:00 | 2015-07-01 11:30:00 |
| Odin | 2015-07-01 11:20:00 | 2015-07-01 12:45:00 |
| Odin | 2015-07-01 12:30:00 | 2015-07-01 15:30:00 |
| Thor | 2015-07-01 15:00:00 | 2015-07-01 17:00:00 |
+------+---------------------+---------------------+

I'd like to check if a specific person is available to work in a given range. For example, I want to have a PHP function that returns the names of people available to work in a given range, like so: canWork($start, $end)

This important part is handling the overlaps, especially since the table could be very, very large. For example, if I called canWork('2015-07-01 11:10:00', '2015-07-01 15:30:00') I would expect to get Odin back given the 1st, 3rd and 4th rows of the table together do cover that range.

Is there an easy way to do this with MySQL? Or PHP?

Upvotes: 1

Views: 83

Answers (2)

Norbert
Norbert

Reputation: 6084

Try to avoid looping over data in this sort of large data situations. In a similar exercise SQL was able to deliver in seconds what in code took hours. Having a smart look at the data pays of.

The smart step here is: You can reduce the number of possible matches by checking the SUM of the time: The time in the range should be equal (or smaller) then the SUM of the time in the records.

However since the start time entered can be smaller then the starttime you are looking for, and the end time can be larger then the endtime you are looking for, you first have to find the end time closest to endtime and start time closest to starttime.

(end is a reserved word, so this code will not work with that columnname, endtime and starttime are the variables for the schedule check)

Start time per user (last possible):

SELECT name,MAX(start) AS MAX_start
FROM scheduleTable
WHERE start<=starttime
GROUP BY name;

End time per user (first possible)

SELECT name,MIN(`end`) AS MIN_end
FROM scheduleTable
WHERE `end`>=endtime
GROUP BY name;

Joining these together gives a subset of possible users, plus this can be filtered on the

SELECT name, MAX_start,MIN_end
FROM 
(SELECT name,MIN(`end`) AS MIN_end
FROM scheduleTable
WHERE `end`>=endtime
GROUP BY name) a
INNER JOIN
(SELECT name,MAX(start) AS MAX_start
FROM scheduleTable
WHERE start<=starttime
GROUP BY name) b ON a.name=b.name;

This will give you a schedule with a valid end as close as possible to the endtime indicated for scheduling purpose but at least equal to the indicated endtime.

Applying the fact that all the time frames together must at least be equal to the endtime-starttime:

SELECT st.name
FROM scheduleTable st
INNER JOIN (
  SELECT name, MAX_start AS start,MIN_end AS end
  FROM 
  (SELECT name,MIN(`end`) AS MIN_end
  FROM scheduleTable
  WHERE `end`>=endtime
  GROUP BY name) a
  INNER JOIN
  (SELECT name,MAX(start) AS MAX_start
  FROM scheduleTable
  WHERE start<=starttime
  GROUP BY name) b ON a.name=b.name
) et ON st.name=et.name
WHERE et.start>={starttime} AND `end`<=et.endtime AND et.name=st.name
GROUP BY st.name
HAVING SUM(st.`end`-st.start)>=(endtime-starttime);

You might have to manipulate the start and end time to unix time or use mysql date time functions for the calculations.

There still might be gaps: Those need a second check. For this use the group_concat to get some data we can pass as 1 call into a function. The function results in 0 for: no gaps found, 1 for gaps found:

SELECT a.name
FROM (
  SELECT st.name,
    GROUP_CONCAT(start ORDER BY start ASC SEPARATOR ',') starttimelist,
    GROUP_CONCAT(`end` ORDER BY `end` ASC SEPARATOR ',') endtimelist
  FROM scheduleTable st
  INNER JOIN (
    SELECT name, MAX_start AS start,MIN_end AS end
    FROM 
    (SELECT name,MIN(`end`) AS MIN_end
    FROM scheduleTable
    WHERE `end`>=endtime
    GROUP BY name) a
    INNER JOIN
    (SELECT name,MAX(start) AS MAX_start
    FROM scheduleTable
    WHERE start<=starttime
    GROUP BY name) b ON a.name=b.name
  ) et ON st.name=et.name
  WHERE et.start>={starttime} AND `end`<=et.endtime AND et.name=st.name
  GROUP BY st.name
  HAVING SUM(st.`end`-st.start)>=(endtime-starttime);
) a
WHERE gapCheck(starttimelist,endtimelist)=0;

WARNING: Do not add DISTINCT to the GROUP_CONCAT: The start/endtimelist will have different lengths and the gaCcheck function will fail....

The function gapCheck: In this function the first start time and the last end time can be ignored: start time is larger or equal then starttime and end time is larger or equal to endtime. So no boundary checks are needed, plus boundaries do not have to be checked for gaps anyway.

CREATE FUNCTION gapCheck(IN starttimeList VARCHAR(200),endtimeList VARCHAR(200))
BEGIN
  DECLARE helperTimeStart,helperTimeEnd,prevHelperTimeStart,prevHelperTimeEnd DATETIME
  DECLARE c,splitIndex,gap INT
SET c-0;
SET gap=0;
WHILE(c=0) DO
  SET splitIndex=INSTR(starttimeList,',');
  IF(splitIndex>0) THEN
    SET helperTimeStart=SUBSTRING(starttimeList,1,splitIndex-1);
    SET starttimeList=SUBSTRING(starttimeList,splitIndex); /* String for the next iteration */
  ELSE
    SET helperTimeStart=starttimeList; /* End of list reached */
    SET helperTimeEnd=endtimeList; /* end can be set too: Lists are of same length */
    SET c=1;
  END IF;

  IF(splitIndex>0) THEN
    SET splitIndex=INSTR(endtimeList,',');
    SET helperTimeEnd=SUBSTRING(endtimeList,1,splitIndex-1);
  END IF;

  IF prevHelperTimeEnd>=helperTimeEnd THEN /* if prevHelperTimeEnd is not set, this is false and the check is skipped: on the first record we can not check anything */
    /* If previous end time > current start time: We have a gap */
    IF CAST(prevHelperTimeEnd AS DATETIME)>=CAST(helperTimeStart AS DATETIME) THEN
      gap=1;
    END IF;
  END IF;

  /* save some data for the next loop */
  SET prevHelperTimeStart=helperTimeStart;
  SET prevHelperTimeEnd=helperTimeEnd;
END WHILE;
RETURN gap;
END;

Upvotes: 1

Sahil Ahuja
Sahil Ahuja

Reputation: 2463

I think the shortest way to do this would be to

1) First merge all timelines for the same person with overlaps. For eg. row 1 and row 3 would be merged to change the end time of row 1 to '2015-07-01 12:45:00' (and row 3 would be deleted or marked used), and then row 1 and row 4 would be merged to again change the end time of row 1 to '2015-07-01 15:30:00'.

2) Once you have a table of non-overlapping timelines, this is a simple problem of finding rows where start <= $start and end >= $end.

For 1) I would prefer executing this process in PHP by first copying the whole table in a data structure

$a = array();
//in a for loop after a select all query: for (all elements) {
  $a[$name][$start] = $end));
//} end of for loop

And then removing all overlaps from that data structure:

for($a as $currName => $timeArray) {
  ksort($timeArray);
  removeOverlaps(&$timeArray);
}

function removeOverlaps($timeArray) {
  $allKeys = array_keys($timeArray);
  $arrLength = count($allKeys);
  for ($i = 0; $i < $arrlength; ++$i) {
    $start = $allKeys[$i];
    if(array_key_exists($start, $timearray)) {
      $end = $timeArray[$start])
      for ($j = $i; $j < $arrlength; ++$j) {
        $newStart = $allKeys[$j];
        $newEnd = $timeArray[$newStart];
        if($newStart <= $end) && ($newEnd > $end)) {
          $timeArray[$start] = $newEnd;
          unset($timeArray[$newStart]);
        }
      }
    }
  }
}

Then continue with 2).

Upvotes: 0

Related Questions