Reputation: 4050
I am wondering if anyone can help me answer this technical question. I have a list of things that can or cannot happen at any given date/time during the course of a year. Let's say there are a million items. To simplify let's say this thing can or cannot happen every half of an hour, 365 days a year. Therefore there are 17,520 possible binary values. I need to query against this in real time to say whether or not something can happen at time x. For obvious reasons it is not efficient to create a table with 17,520 columns.
Can anyone recommend any methods that would allow me to handle such data? Thanks.
Upvotes: 2
Views: 80
Reputation: 108651
The SQLish way to do this is to record rows in a table that look roughly like this, with the name of the event and the timestamp.
event | when
happened | 2014-01-01 10:13
happened | 2014-01-01 10:33
didnt | 2014-01-01 11:13
Then, to get a summary of the events by hour for yesterday, you can do this:
SELECT DATE_FORMAT(when,'%Y-%m-%d %H:00') AS when,
SUM(IF('happened'=event,1,0)) AS happened,
SUM(IF('didnt'=event,1,0)) AS didnt
FROM event
WHERE when >= CURDATE() - INTERVAL 1 DAY
AND when < CURDATE()
GROUP BY DATE_FORMAT(when,'%Y-%m-%d %H:00')
ORDER BY DATE_FORMAT(when,'%Y-%m-%d %H:00')
Some notes here: The DATE_FORMAT(when,'%Y-%m-%d %H:00')
expression reduces dates to the nearest hour. The WHERE
expression picks up everything from midnight yesterday up to but not including midnight today.
If you want to summarize by half-hour, use this GROUP BY
expression instead:
DATE_FORMAT(obstime,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(obstime) - MINUTE(obstime) MOD 30) MINUTE
In the SQL world there's no need to encode this sort of timeseries data in hyperoptimized data structures like bit strings. That's basically true in all modern computers; disk space is really cheap.
Upvotes: 0
Reputation: 2195
I'd recommend using something like a bitmap to store this type of data.
This should be easy to manipulate and would store efficiently (possibly compressed).
retrieve the bitmap for the event and 'AND' it with your timeslot. To find out if it is a valid time or not.
Upvotes: 1
Reputation: 25205
3 tables:
table 1 : things (id, thingname)
table 2 : all possible times thing(s) from table 1 can happen times -(id,time)
table 3 : mapping table of things -> times they can happen thing_times(thing_id,time_id)
if you want to look up can thing x happen at time y
select * from thing_times inner join times on thing_times.timeid = times.id inner join
things on thing_times.thingid = things.id where times.time = $canItHappenTime and
things.thingname = $thingIWantToCheckTimeFor
Or you can leave out the join to table1 if you already know the id of the thing
also, make sure you add indexes on the columns (thingname and time) you are searching on so the query doesn't have to do table scans...
Upvotes: 0
Reputation: 1842
I recommend to create a time lapse dimension table, the 17520 records, which you can use to make reports for example of at what time lapses did nothing happened.
Keep in mind that more inefficient are many column tables rather than a two or three columns table like this but long on records.
Upvotes: 0
Reputation: 101
Depending on the amount of times the event happen/doesn't happen, i would recommend saving the difference between dates in a given measurement (half of an hour, i guess?). If there's a lot of times where it doesn't happen, then you can save times it does happen and skip through times where it doesn't.
Ex,
if you have
v = 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
you can save it as
v2 = 0 2 0 4 0 0 4 1 2 4
if you save this as something smaller than an int, this can also be efficient in space.
Upvotes: 0