Reputation: 399
I have a database (DB) with ~ 4 million rows of data. One field is of "time-stamp" type. Data from the DB is shown summarized on series of web pages (too simple to call it a web site). On the first page I extract all available years from the "time-stamp" field. It takes about 8-10 seconds. Data is real-time measurements and they range from 2009 till 2014. Data is uploaded for random periods once in a week.
My question is about that array of years.
What is the best way to handle such information (which is close to static) ?
What I do now is extract data once, then store it in a session variable for further use.
Is it better to create a view from the select query?
Any other ideas.
To clarify, DB is 5.5.35-MariaDB for 64bit Linux.
Thank you.
Upvotes: 0
Views: 184
Reputation: 396
First, if you're concerned about performance, you might ensure that you're effectively indexing data for your query. That alone, if not already done, could dramatically improve the query times you're seeing.
If you're concerned that you're querying data that doesn't change frequently, but need to reliably know that the data you're using is current, you might consider creating a cached version of your data with an INSERT trigger.
The scenario that I envision would have a secondary table that stores the intermediate results of your query (assuming it's not just a SELECT date FROM myFirstTable
kind of query). I'll use the following query to illustrate my case (pretending this represents your first page of data):
SELECT date, COUNT(date) AS numberOf
FROM myFirstTable
GROUP BY date
ORDER BY date ASC;
Now, supposing this information changes infrequently, I might want to cache the results of this query in an intermediate table (which would be the data source you query against when generating your page). With an INSERT trigger on myFirstTable that recalculates this data as necessary, you can ensure that what's in that table is current (assuming the existence of an intermediate table mySecondTable (date DATE, numberOf INT):
CREATE TRIGGER trg_ins_myFirstTable AFTER INSERT ON myFirstTable
FOR EACH ROW
BEGIN
-- flush out intermediate results:
TRUNCATE TABLE mySecondTable;
-- rebuild dataset:
INSERT INTO mySecondTable (date, numberOf)
SELECT date, COUNT(date) AS numberOf
FROM myFirstTable
GROUP BY date
ORDER BY date ASC;
END;
Thereafter, you would query from mySecondTable instead of myFirstTable, knowing not only that you are querying recent data, but (assuming that you were doing a complex query to generate your data) you've also front-loaded the work to generate results on your insert statement. Be warned that this does have an adverse impact on insertion performance, with the understanding that it improves overall performance at query time.
The above code is untested, and without knowing your exact situation (e.g. the query you're running, names of tables, whether you're able to create new tables) I don't know that this perfectly solves your problem.
Upvotes: 1