Reputation: 140
I am creating a database to store data from a monitoring system that I have created. The system takes a bunch of data points(~4000) a couple times every minute and stores them in my database. I need to be able to down sample based on the time stamp. Right now I am planning on using one table with three columns:
results:
1. point_id
2. timestamp
3. value
so the query I'd be like to do would be:
SELECT point_id,
MAX(value) AS value
FROM results
WHERE timestamp BETWEEN date1 AND date2
GROUP BY point_id;
The problem I am running into is this seems super inefficient with respect to memory. Using this structure each time stamp would have to be recorded 4000 times, which seems a bit excessive to me. The only solutions I thought of that reduce the memory footprint of my database requires me to either use separate tables (which to my understanding is super bad practice) or storing the data in CSV files which would require me to write my own code to search through the data (which to my understanding requires me not to be a bum... and probably search substantially slower). Is there a database structure that I could implement that doesn't require me to store so much duplicate data?
Upvotes: 0
Views: 98
Reputation: 10316
Using this structure each time stamp would have to be recorded 4000 times, which seems a bit excessive to me.
Not really. Date values are not that big and storing the same value for each row is perfectly reasonable.
...use separate tables (which to my understanding is super bad practice)
Who told you that!!! Normalising data (splitting into separate, linked data structures) is actually a good practise - so long as you don't overdo it - and SQL is designed to perform well with relational tables. It would perfectly fine to create a "time" table and link to the data in the other table. It would use a little more memory, but that really shouldn't concern you unless you are working in a very limited memory environment.
Upvotes: 0
Reputation: 1271151
A database on with your data structure is going to be less efficient than custom code. Guess what. That is not unusual.
First, though, I think you should wait until this is actually a performance problem. A timestamp
with no fractional seconds requires 4 bytes (see here). So, a record would have, say 4+4+8=16 bytes (assuming a double floating point representation for value
). By removing the timestamp you would get 12 bytes -- savings of 25%. I'm not saying that is unimportant. I am saying that other considerations -- such as getting the code to work -- might be more important.
Based on your data, the difference is between 184 Mbytes/day and 138 Mbytes/day, or 67 Gbytes/year and 50 Gbytes. You know, you are going to have to deal with biggish data issues regardless of how you store the timestamp.
Keeping the timestamp in the data will allow you other optimizations, notably the use of partitions to store each day in a separate file. This should be a big benefit for your queries, assuming the where
conditions are partition-compatible. (Learn about partitioning here.) You may also need indexes, although partitions should be sufficient for your particular query example.
The point of SQL is not that it is the most optimal way to solve any given problem. Instead, it offers a reasonable solution to a very wide range of problems, and it offers many different capabilities that would be difficult to implement individually. So, the time to a reasonable solution is much, much less than developing bespoke code.
Upvotes: 1