Reputation: 705
We have the following data structure in a MySQL table which basically logs user actions on pages
id int
page_id int
user_id int
action_type enum(6)
date_created datetime`
We have the following indexes:
id Primary key
user_id-page_id-date_created unique
page_id-user_id-date_created
user_id
page_id-date_created
Our issue is that this table currently has 125 million rows and it's growing at a rate of 0.8 million a day which makes the inserts take about 2 hours to complete. The inserts are made through 3 queries that select data from 3 other tables. What could we do to improve this time? Should we drop mysql and try other database solutions?
L.E: Based on your feedback I am trying to give more info. First of all the tables are MyISAM and these inserts happen once every night in a cron job and we do not delete any data from them. Here is how i handle the inserts. I will refer to the big table as big_table and each of the 3 tables will be content_table because they are similar in structure. The explains will be for the largest of the 3 tables which has around 108.5 million. First i get the id from which i should start inserting using php. (I'm ok with the 3 minutes for the non-indexed query to get it)
SELECT id FROM content_table WHERE date_created > "2012-04-18" ORDER BY id ASC LIMIT 1;
+-----------+
| id |
+-----------+
| 107278872 |
+-----------+
1 row in set (3 min 15.52 sec)
EXPLAIN SELECT id FROM content_table WHERE date_created > "2012-04-18" ORDER BY id ASC LIMIT 1;
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | content_table | index | NULL | PRIMARY | 4 | NULL | 1 | Using where |
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.06 sec)
And then using this id i do the following
INSERT IGNORE INTO big_table (user_id, page_id, type, date_created)
SELECT user_id, page_id, IF (is_admin,"admin_action","action") as type, created_time FROM content_table WHERE id >= "107278872";
Here's how the explain for the select looks like
EXPLAIN SELECT user_id, page_id, IF (is_admin,"admin_action","action") as type, created_time FROM content_table WHERE id >= "107278872";
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
| 1 | SIMPLE | content_table | range | PRIMARY | PRIMARY | 4 | NULL | 777864 | Using where |
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)
I've also tried it in phpmyadmin and got times of around 0.004s so i think it's the inserts that take time and not the data fetching. All i know about the server is that it is a quad core xeon @ 2.4 ghz and 16 GB of ram but I do not know anything about storage (will come back as soon as i have that info). And the data is not used for logging only we need to have statitistics like which users were most active on pages, various groupings etc and the user can specify any interval for these.
Upvotes: 0
Views: 307
Reputation: 18576
How are you inserting them ? You can only have a certain number of transactions per second looping querys like this
//start loop
insert into table values (1)
//end loop
will be MUCH slower than
//start loop
//fill a variable
//end loop
insert into table values (1),(2),(3),(4) // where (1),(2),(3),(4) are values filled by the loop
(note that you cant insert too many values like this, try it with your data, I usually find 200 or so is a pretty good value)
You dont have a huge amount of index's so i dont think they are cauing problems, phpmyadmin shows the index size, Check what that is compared to the total table size - that may give you an idea of if its storing too much
Upvotes: 0
Reputation: 4069
There are so many factors affecting the insert such as
First of tell us how you are inserting the data, second thing which storage engine is using for table then we can further optimize you insert query, in general I can say unnecessary index means slow insert rate.
For more explanation on the speed of inserts read this article of dev.mysql for insert speed.
Upvotes: 0
Reputation: 309028
You could:
All relational databases will suffer from having to deal with too much data. Your first thought should not be to drop MySQL; it should be figuring out what your archiving strategy needs to be. You have to decide exactly how much of that data is needed in your transactional store at a given time.
Upvotes: 3