Reputation: 2299
I have a MySQL table that looks like this:
record_id amount start_date end_date
1 20000 2018-01-01 2018-12-01
1 -15000 2018-02-01 2018-04-01
1 50000 2018-04-02 2018-06-30
This gets translated in PHP to an array that looks like this:
[1] => [
[
"start_date" => "2018-01-01",
"end_date" => "2018-12-01",
"amount" => 20000
],
[
"start_date" => "2018-02-01",
"end_date" => "2018-04-01",
"amount" => -15000
],
[
"start_date" => "2018-04-02",
"end_date" => "2018-06-30",
"amount" => 50000
],
]
The problem is, I need to sum overlapping amounts in the date ranges, and keep an original amount for the longest date range. So the resulting MySQL table would look like this:
record_id amount start_date end_date
1 20000 2018-01-01 2018-01-30
1 5000 2018-02-01 2018-04-01
1 70000 2018-04-02 2018-06-30
1 20000 2018-07-01 2018-12-01
And the resulting PHP array would look like this:
[1] => [
[
"start_date" => "2018-01-01",
"end_date" => "2018-01-30",
"amount" => 20000
],
[
"start_date" => "2018-02-01",
"end_date" => "2018-04-01",
"amount" => 5000
],
[
"start_date" => "2018-04-02",
"end_date" => "2018-06-30",
"amount" => 70000
],
[
"start_date" => "2018-07-01",
"end_date" => "2018-12-01",
"amount" => 20000
],
]
Basically, the amount from the first start date to the day before the next start date is the current amount. Then for any overlapping date range, the amount gets summed. So 20,000 + -15,000 = 5,000. Then, for the date range 2018-04-02 to 2018-06-30, the initial amount (still valid since it runs through december) will be added to the 50,000, making it 70,000. Finally, the end date is still running so we make a new entry for 20,000.
I could make pseudo-code for a single case like this, but these cases are all dynamic, and the overlapping dates and ranges could change.
Apparently, it's impossible / very hard to do this in MySQL, so I was wondering if anyone had experience doing these overlapping date ranges in PHP.
Edit: I've marked strawberry's answer as correct as it works. If possible, he can paste this with his answer and I can remove this edit, but there is some work to be done before his answer works (pre mysql 8.0). You can run this script to get his answer to work, before doing anything else (source is slightly modified from this version):
DROP TABLE IF EXISTS calendar;
CREATE TABLE calendar(
id INTEGER PRIMARY KEY, -- year*10000+month*100+day
dt DATE NOT NULL,
year INTEGER NOT NULL,
month INTEGER NOT NULL, -- 1 to 12
day INTEGER NOT NULL, -- 1 to 31
quarter INTEGER NOT NULL, -- 1 to 4
week INTEGER NOT NULL, -- 1 to 52/53
day_name VARCHAR(9) NOT NULL, -- 'Monday', 'Tuesday'...
month_name VARCHAR(9) NOT NULL, -- 'January', 'February'...
holiday_flag CHAR(1) DEFAULT 'f' CHECK (holiday_flag in ('t', 'f')),
weekend_flag CHAR(1) DEFAULT 'f' CHECK (weekend_flag in ('t', 'f')),
event VARCHAR(50),
UNIQUE td_ymd_idx (year,month,day),
UNIQUE td_dt_idx (dt)
) Engine=MyISAM;
DROP PROCEDURE IF EXISTS fill_date_dimension;
DELIMITER //
CREATE PROCEDURE fill_date_dimension(IN startdate DATE,IN stopdate DATE)
BEGIN
DECLARE currentdate DATE;
SET currentdate = startdate;
WHILE currentdate < stopdate DO
INSERT INTO calendar VALUES (
YEAR(currentdate)*10000+MONTH(currentdate)*100 + DAY(currentdate),
currentdate,
YEAR(currentdate),
MONTH(currentdate),
DAY(currentdate),
QUARTER(currentdate),
WEEKOFYEAR(currentdate),
DATE_FORMAT(currentdate,'%W'),
DATE_FORMAT(currentdate,'%M'),
'f',
CASE DAYOFWEEK(currentdate) WHEN 1 THEN 't' WHEN 7 then 't' ELSE 'f' END,
NULL);
SET currentdate = ADDDATE(currentdate,INTERVAL 1 DAY);
END WHILE;
END
//
DELIMITER ;
TRUNCATE TABLE calendar;
CALL fill_date_dimension('1-01-01','2040-01-01');
OPTIMIZE TABLE calendar;
Upvotes: 0
Views: 70
Reputation: 33935
Here's a solution that uses a simple calendar(dt) utility table...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(start_date DATE NOT NULL PRIMARY KEY
,end_date DATE NOT NULL
,amount INT NOT NULL
);
INSERT INTO my_table VALUES
('2018-01-01','2018-12-01', 20000),
('2018-02-01','2018-04-01',-15000),
('2018-04-02','2018-06-30', 50000);
SELECT MIN(dt) range_start
, MAX(dt) range_end
, MIN(amount) amount
FROM
(
SELECT dt
, amount
, CASE WHEN @prev=amount THEN @i:=@i ELSE @i:=@i+1 END i
, @prev:=amount
FROM
( SELECT x.*
, SUM(y.amount) amount
FROM calendar x
JOIN my_table y
ON x.dt BETWEEN y.start_date AND y.end_date
GROUP
BY x.dt
) a
JOIN (SELECT @prev:=null,@i:=0) vars
ORDER
BY dt
) n
GROUP
BY i;
+-------------+------------+--------+
| range_start | range_end | amount |
+-------------+------------+--------+
| 2018-01-01 | 2018-01-31 | 20000 |
| 2018-02-01 | 2018-04-01 | 5000 |
| 2018-04-02 | 2018-06-30 | 70000 |
| 2018-07-01 | 2018-12-01 | 20000 |
+-------------+------------+--------+
Upvotes: 2