Reputation: 2037
I have an event input of this type
event user
event start
event end
event type
Inserted to MySql table, each in its own row with user+start as primary key.
I need to query an histogram for a type by time interval (say minute) counting events occurred on each time interval. something like:
SELECT count(*) as hits FROM events
WHERE type="browsing"
GROUP BY time_diff("2015-1-1" AND "2015-1-2") / 60 * second
but I could not find any way to do that in SQL besides writing code, any idea?
Sample data
user, start, end, type
1, 2015-1-1 12:00:00, 2015-1-1 12:03:59, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, eating
3, 2015-1-1 12:03:00, 2015-1-1 12:08:00, browsing
the result should look like this:
^
count |
browsing |
users | *
| * * * *
| * * * * * * * *
--|--|--|--|--|--|--|--|--|--> minute
0 1 2 3 4 5 6 7 8 9
Upvotes: 9
Views: 8750
Reputation: 1334
You can do this using group by with the level that you want. Here is an example using the data you gave:
First the SQL to create the table and populate it. The ID column here isn't "needed" but it is recommended if the table will be large or have indexes on it.
CREATE TABLE `test`.`events` (
`id` INT NOT NULL AUTO_INCREMENT,
`user` INT NULL,
`start` DATETIME NULL,
`end` DATETIME NULL,
`type` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
INSERT INTO events (user, start, end, type) VALUES
(1, '2015-1-1 12:00:00', '2015-1-1 12:03:59', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'eating'),
(3, '2015-1-1 12:03:00', '2015-1-1 12:08:00', 'browsing');
To get a list of ordered pairs of number of minutes duration to number of events:
The query can then be easily written using the timestampdiff fuction, as shown below:
SELECT
TIMESTAMPDIFF(MINUTE, start, end) as minutes,
COUNT(*) AS numEvents
FROM
test.events
GROUP BY TIMESTAMPDIFF(MINUTE, start, end)
The output:
minutes numEvents
3 3
5 1
The first parameter in the select can be one of FRAC_SECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR.
Here are some more examples of queries you can do:
Events by hour (floor function is applied)
SELECT
TIMESTAMPDIFF(HOUR, start, end) as hours,
COUNT(*) AS numEvents
FROM
test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)
**Events by hour with better formatting **
SELECT
CONCAT("<", TIMESTAMPDIFF(HOUR, start, end) + 1) as hours,
COUNT(*) AS numEvents
FROM
test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)
You can group by a variety of options, but this should definitely get you started. Most plotting packages will allow you to specify arbitrary x y coordinates, so you don't need to worry about the missing values on the x axis.
To get a list of ordered pairs of number of events at a specific time (for logging): Note that this is left for reference.
Now for the queries. First you have to pick which item you want to use for the grouping. For example, a task might take more than a minute, so the start and end would be in different minutes. For all these examples, I am basing them off of the start time, since that is when the event actually took place.
To group event counts by minute, you can use a query like this:
SELECT
DATE_FORMAT(start, '%M %e, %Y %h:%i %p') as minute,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start), MINUTE(start);
Note how this groups by all the items, starting with year, going the minute. I also have the minute displayed as a label. The resulting output looks like this:
minute numEvents
January 1, 2015 12:00 PM 1
January 1, 2015 12:03 PM 3
This is data that you could then take using php and prepare it for display by one of the many graphing libraries out there, plotting the minute column on the x axis, and plotting the numEvents on the y axis.
Here are some more examples of queries you can do:
Events by hour
SELECT
DATE_FORMAT(start, '%M %e, %Y %h %p') as hour,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start);
Events by date
SELECT
DATE_FORMAT(start, '%M %e, %Y') as date,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start);
Events by month
SELECT
DATE_FORMAT(start, '%M %Y') as date,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start);
Events by year
SELECT
DATE_FORMAT(start, '%Y') as date,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start);
I should also point out that if you have an index on the start column for this table, these queries will complete quickly, even with hundreds of millions of rows.
Hope this helps! Let me know if you have any other questions about this.
Upvotes: 11
Reputation: 1269553
I am going to assume that you have a numbers table that contains integers. You also have $starttime
and $endtime
.
This is one way to get the values you want:
select ($starttime + interval n.n - 1 minute) as thetime, n.n as minutes,
count(sd.user)
from numbers n left join
sampledata sd
on $starttime + interval n.n - 1 minute between sd.start and sd.end
where $starttime + interval n.n - 1 minute <= $endtime and
sd.end >= $starttime and
sd.start <= $endtime
group by n.n
order by n.n;
Upvotes: 0