Reputation: 16468
I have this table structure:
EDIT more complex example: add hidden range
category| day | a |
--------|------------|-------|
1 | 2012-01-01 | 4 |
1 | 2012-01-02 | 4 |
1 | 2012-01-03 | 4 |
1 | 2012-01-04 | 4 |
1 | 2012-01-05 | 5 |
1 | 2012-01-06 | 5 |
1 | 2012-01-07 | 5 |
1 | 2012-01-08 | 4 |
1 | 2012-01-09 | 4 |
1 | 2012-01-10 | 4 |
1 | 2012-01-11 | 5 |
1 | 2012-01-12 | 5 |
1 | 2012-01-16 | 5 |
1 | 2012-01-17 | 5 |
1 | 2012-01-18 | 5 |
1 | 2012-01-19 | 5 |
...
with 'category-day' as unique keys. I would extract a range of dates, for each category, according with column "a" and given limit range, like so:
1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5
or similar.
I search the best way for do it. Using only mysql preferably but also with a little bit of php.
NOTE1: not all day are inserted: between two days non-contiguos could not be other days. In this case I would in output the missed range with column "a" = 0.
NOTE2: I did it with a simple query and some rows of php but I don't like it because my simple algorithm need a cycle for each day in range multiplied for each category found. If range is too big and there are too much categories, that's not so good.
FINAL EDIT: OK! After reading all comments and answers, I think not exists a valid, efficient and, at same time, readable solution. So Mosty Mostacho answer is a no 100% valid solution, but it has 100% valid suggestions. Thank you all.
Upvotes: 8
Views: 582
Reputation: 9300
Completely on mysql side will have performance adv: Once the procedure has been created, it runs within 0.35 - 0.37 sec
create procedure fetch_range()
begin
declare min date;
declare max date;
create table testdate(
date1 date
);
select min(day) into min
from category;
select max(day) into max
from category;
while min <= max do
insert into testdate values(min);
set min = adddate(min,1);
end while;
select concat(category,',',min(day)),concat(max(day),',',a)
from(
SELECT if(isNull(category),@category,category) category,if(isNull(day),date1,day) day,@a,if(isNull(a) || isNull(@a),if(isNull(a) && isNull(@a),@grp,@grp:=@grp+1),if(@a!=a,@grp:=@grp+1,@grp)) as sor_col,if(isNull(a),0,a) as a,@a:=a,@category:= category
FROM `category`
RIGHT JOIN testdate ON date1 = category.day) as table1
group by sor_col;
drop table testdate;
end
o/p:
1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5
Here is mysql solution which will give the desired result excluding the missed range only.
PHP: The missing range can be added through php.
$sql = "set @a=0,@grp=0,@datediff=0,@category=0,@day='';";
mysql_query($sql);
$sql= "select category,min(day)min,max(day) max,a
from(
select category,day,a,concat(if(@a!=a,@grp:=@grp+1,@grp),if(datediff(@day,day) < -1,@datediff:=@datediff+1,@datediff)) as grp_datediff,datediff(@day,day)diff, @day:= day,@a:=a
FROM category
order by day)as t
group by grp_datediff";
$result = mysql_query($sql);
$diff = 0;
$indx =0;
while($row = mysql_fetch_object($result)){
if(isset($data[$indx - 1]['max'])){
$date1 = new DateTime($data[$indx - 1]['max']);
$date2 = new DateTime($row->min);
$diff = $date1->diff($date2);
}
if ($diff->days > 1) {
$date = new DateTime($data[$indx-1]['max']);
$interval = new DateInterval("P1D");
$min = $date->add($interval);
$date = new DateTime($data[$indx-1]['max']);
$interval = new DateInterval("P".$diff->days."D");
$max = $date->add($interval);
$data[$indx]['category'] = $data[$indx-1]['category'];
$data[$indx]['min'] = $min->format('Y-m-d');
$data[$indx]['max'] = $max->format('Y-m-d');
$data[$indx++]['a'] = 0;
$data[$indx]['category'] = $row->category;
$data[$indx]['min'] = $row->min;
$data[$indx]['max'] = $row->max;
$data[$indx]['a'] = $row->a;
}else{
$data[$indx]['category'] = $row->category;
$data[$indx]['min'] = $row->min;
$data[$indx]['max'] = $row->max;
$data[$indx]['a'] = $row->a;
}
$indx++;
}
Upvotes: 1
Reputation: 51888
No need for PHP or temporary tables or anything.
DISCLAIMER: I did this just for fun. This stunt may be too crazy to be used in a production environment. Therefore I'm not posting this as a "real" solution. Also I'm not willing to explain how it works :) And I didn't rethink / refactor it. There might be more elegant ways and names / aliases could be more informative. So please no flame or anything.
Here's my solution. Looks more complicated than it is. I think it may be easier to understand than other answers, no offense :)
Setting up test data:
drop table if exists test;
create table test(category int, day date, a int);
insert into test values
(1 , '2012-01-01' , 4 ),
(1 , '2012-01-02' , 4 ),
(1 , '2012-01-03' , 4 ),
(1 , '2012-01-04' , 4 ),
(1 , '2012-01-05' , 5 ),
(1 , '2012-01-06' , 5 ),
(1 , '2012-01-07' , 5 ),
(1 , '2012-01-08' , 4 ),
(1 , '2012-01-09' , 4 ),
(1 , '2012-01-10' , 4 ),
(1 , '2012-01-11' , 5 ),
(1 , '2012-01-12' , 5 ),
(1 , '2012-01-16' , 5 ),
(1 , '2012-01-17' , 5 ),
(1 , '2012-01-18' , 5 ),
(1 , '2012-01-19' , 5 );
And here it comes:
SELECT category, MIN(`day`) AS firstDayInRange, max(`day`) AS lastDayInRange, a
, COUNT(*) as howMuchDaysInThisRange /*<-- as a little extra*/
FROM
(
SELECT
IF(@prev != qr.a, @is_a_changing:=@is_a_changing+1, @is_a_changing) AS is_a_changing, @prev:=qr.a, qr.* /*See if column a has changed. If yes, increment, so we can GROUP BY it later*/
FROM
(
SELECT
test.category, q.`day`, COALESCE(test.a, 0) AS a /*When there is no a, replace NULL with 0*/
FROM
test
RIGHT JOIN
(
SELECT
DATE_SUB(CURDATE(), INTERVAL number_days DAY) AS `day` /*<-- Create dates from now back 999 days. This query is surprisingly fast. And adding more numbers to create more dates, i.e. 10000 dates is also no problem. Therefor a temporary dates table might not be necessary?*/
FROM
(
SELECT (a + 10*b + 100*c) AS number_days FROM
(SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) aa
, (SELECT 0 AS b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) bb
, (SELECT 0 AS c UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) cc
)sq /*<-- This generates numbers 0 to 999*/
)q USING(`day`)
, (SELECT @is_a_changing:=0, @prev:=0) r
/*This WHERE clause is just to beautify. It may not be necessary*/
WHERE q.`day` >= (SELECT MIN(test.`day`) FROM test) AND q.`day` <= (SELECT MAX(test.`day`) FROM test)
)qr
)asdf
GROUP BY is_a_changing
ORDER BY 2
Result looks like this:
category firstDayInRange lastDayInRange a howMuchDaysInThisRange
--------------------------------------------------------------------------
1 2012-01-01 2012-01-04 4 4
1 2012-01-05 2012-01-07 5 3
1 2012-01-08 2012-01-10 4 3
1 2012-01-11 2012-01-12 5 2
2012-01-13 2012-01-15 0 3
1 2012-01-16 2012-01-19 5 4
Upvotes: 2
Reputation: 43444
New edit:
As I told you in a comment, I strongly recommend you to use the quick query and then process the missing dates in PHP as that would be faster and more readable:
select
concat(@category := category, ',', min(day)) col1,
concat(max(day), ',', @a := a) col2
from t, (select @category := '', @a := '', @counter := 0) init
where @counter := @counter + (category != @category or a != @a)
group by @counter, category, a
However, if you still want to use the query version, then try this:
select
@counter := @counter + (category != @category or a != @a) counter,
concat(@category := category, ',', min(day)) col1,
concat(max(day), ',', @a := a) col2
from (
select distinct s.day, s.category, coalesce(t1.a, 0) a
from (
select (select min(day) from t) + interval val - 1 day day, c.category
from seq s, (select distinct category from t) c
having day <= (select max(day) from t)
) s
left join t t1 on s.day = t1.day and s.category = t1.category
where s.day between (
select min(day) from t t2
where s.category = t2.category) and (
select max(day) from t t2
where s.category = t2.category)
order by s.category, s.day
) t, (select @category := '', @a := '', @counter := 0) init
group by counter, category, a
order by category, min(day)
Note that MySQL won't allow you to create data on the fly, unless you hardcode UNIONS
, for example. This is an expensive process that's why I strongly suggest you to create a table with only an integer
field with values from 1
to X
, where X
is, at least the maximum amount of dates that separate the min(day)
and max(day)
from your table. If you're not sure about that date, just add 100,000
numbers and you'll be able to generate range periods for over 200 years. In the previous query, this table is seq
and the column it has is val
.
This results in:
+--------------+--------------+ | COL1 | COL2 | +--------------+--------------+ | 1,2012-01-01 | 2012-01-04,4 | | 1,2012-01-05 | 2012-01-07,5 | | 1,2012-01-08 | 2012-01-10,4 | | 1,2012-01-11 | 2012-01-12,5 | | 1,2012-01-13 | 2012-01-15,0 | | 1,2012-01-16 | 2012-01-19,5 | +--------------+--------------+
Ok, I'm lying. The result is actually returning a counter
column. Just disregard it, as removing it (using a derived table) would be even less performant!
Upvotes: 5
Reputation: 12973
Firstly, this is an extension of @Mosty's solution.
To enable Mosty's solution to include category/date combinations than do not exist in the table I took the following approach -
Start by getting a distinct list of categories and then join this to the entire date range -
SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`
The above query builds the full date range using the table dummy
with a single field id
. The id
field contains 0,1,2,3,.... - it needs to have enough values to cover every day in the required date range. This can then be joined back to the original table to create a complete list of all categories for all dates and the appropriate value for a -
SELECT cj.category, cj.`day`, IFNULL(t.a, 0) AS a
FROM (
SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`
) AS cj
LEFT JOIN t
ON cj.category = t.category
AND cj.`day` = t.`day`
This can then be applied to Mosty's query in place of table t
-
SELECT
CONCAT(@category := category, ',', MIN(`day`)) col1,
CONCAT(MAX(`day`), ',', @a := a) col2
FROM (
SELECT cj.category, cj.day, IFNULL(t.a, 0) AS a
FROM (
SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`
) AS cj
LEFT JOIN t
ON cj.category = t.category
AND cj.`day` = t.day) AS t, (select @category := '', @a := '', @counter := 0) init
WHERE @counter := @counter + (category != @category OR a != @a)
GROUP BY @counter, category, a
Upvotes: 1
Reputation: 15616
and here's a one liner brutality for you :) (Note: Change the "datt" table name.)
select dd.category,
dd.day as start_day,
(select dp.day from
(
select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
)
union
select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
)
) dp where dp.day >= dd.day - INTERVAL (n-2) DAY order by day asc limit 0,1)
as end_day,
dd.a from (
select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
)
union
select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
)
) dd
where n=1
and it's output is :
|| 1 || 2012-01-01 || 2012-01-01 || 4 ||
|| 1 || 2012-01-03 || 2012-01-04 || 4 ||
|| 1 || 2012-01-05 || 2012-01-07 || 5 ||
|| 1 || 2012-01-08 || 2012-01-10 || 4 ||
|| 1 || 2012-01-11 || 2012-01-12 || 5 ||
Note: Thats the result for non-existing 2012-01-02 in a 01-12 day table.
Upvotes: 2
Reputation: 1309
If I understand your question correctly, I would use something to the effect of:
SELECT MAX(day), MIN(day) FROM `YourTable` WHERE `category`= $cat AND `A`= $increment;
... and ...
$dateRange = $cat.","."$min"."|"."$max".",".$increment;
Upvotes: 0
Reputation: 12806
Is this what you mean?
SELECT
category,
MIN(t1.day),
MAX(t2.day),
a
FROM
`table` AS t1
INNER JOIN `table` AS t2 USING (category, a)
Upvotes: 0
Reputation: 58444
To make this work as you want it to, you should have two tables:
Where each period can have many days related to it through FOREIGN KEY
. With current table structure, the best you can do is to detect the continuous periods on PHP side.
Upvotes: 1