Reputation: 290
Problem: We have a very big table, and growing. Most of its entries (say 80%) are historical data (with "DATE" field past current date) that are seldom queried, while small part of it (say 20%) are current data ("DATE" field after current date), most queries search these current entries.
Consider two possible scenarios, which one would be better (considering the overall implementation difficulty and performance,...)
Breaking the big table into two table: Historical and Current data. And on daily basis I move the records with expired date from Current table to Historical table.
Keeping record in one table (the DATA field is defined as INDEXED).
The scenario A would indicate more hustle in implementation and maintenance, and overload on daily bases for moving date between tables, while scenario B would indicate searching a big database (though indexed). Does it impose memory problems? Which scenario is recommended? IS there any other recommendations?
Upvotes: 5
Views: 423
Reputation: 49
In my experience, most systems with big data have historical tables. In most cases that I have been, both the current data and historical data have different user-groups. The current data are used by the front end users to deal with customers with their current or recent transactions. The historical data are usually used by the user groups who do not have to talk with customers/clients directly.
Do not worry much about the issue of implementation and maintenance as I think your main consideration is all about performance. Implementation is only a one-time deal that will run on a specified frequency (like weekly, monthly or yearly archival) after you moved the program/s in production. Maintenance is very small and you can just forget about it once it is already implemented. You just have to make sure that you test the programs thoroughly.
For a normalized historical tables, tables have the same structure and field names which makes the data copy much easier. This way, one can just to a table join between the tables.
If you choose to not split the data, you will continue to add index after index. But somewhere down the road, you will still encounter the same issue again.
Upvotes: 2
Reputation: 1270713
You usually don't want to break a big table into multiple tables, although having a current and historical table is totally reasonable. Your process makes sense. You can then optimize the current table for your query needs. I would probably go for two tables (given the limited information you provide), because it allows such optimization.
However, don't split the historical data. Instead, use partitioning. See the documentation. One caveat: queries need to specify the partitioning key in the where
clause to take advantage of the partitions. With a large table, this is typical anyway.
Upvotes: 4
Reputation: 544
Question: is the historical data necessary for system functionality or are these records stored for other purposes (e.g. audits)? It may be time to clean house by moving the historical data to an archive.
Upvotes: 2