Overdeath
Overdeath

Reputation: 705

How can i improve inserts times in a big table?

We have the following data structure in a MySQL table which basically logs user actions on pages

id int
page_id int
user_id int
action_type enum(6)
date_created datetime`

We have the following indexes:

id Primary key
user_id-page_id-date_created unique
page_id-user_id-date_created
user_id
page_id-date_created

Our issue is that this table currently has 125 million rows and it's growing at a rate of 0.8 million a day which makes the inserts take about 2 hours to complete. The inserts are made through 3 queries that select data from 3 other tables. What could we do to improve this time? Should we drop mysql and try other database solutions?

L.E: Based on your feedback I am trying to give more info. First of all the tables are MyISAM and these inserts happen once every night in a cron job and we do not delete any data from them. Here is how i handle the inserts. I will refer to the big table as big_table and each of the 3 tables will be content_table because they are similar in structure. The explains will be for the largest of the 3 tables which has around 108.5 million. First i get the id from which i should start inserting using php. (I'm ok with the 3 minutes for the non-indexed query to get it)

SELECT id FROM content_table WHERE date_created > "2012-04-18" ORDER BY id ASC LIMIT 1;
+-----------+
| id        |
+-----------+
| 107278872 |
+-----------+
1 row in set (3 min 15.52 sec)

EXPLAIN SELECT id FROM content_table WHERE date_created > "2012-04-18" ORDER BY id ASC LIMIT 1;
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | content_table    | index | NULL          | PRIMARY | 4       | NULL |    1 | Using where |
+----+-------------+------------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.06 sec)

And then using this id i do the following

INSERT IGNORE INTO big_table (user_id, page_id, type, date_created) 
SELECT user_id, page_id, IF (is_admin,"admin_action","action") as type, created_time FROM content_table WHERE id >= "107278872";

Here's how the explain for the select looks like

EXPLAIN SELECT user_id, page_id, IF (is_admin,"admin_action","action") as type, created_time FROM content_table WHERE id >= "107278872";
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | content_table    | range | PRIMARY       | PRIMARY | 4       | NULL | 777864 | Using where |
+----+-------------+------------------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)

I've also tried it in phpmyadmin and got times of around 0.004s so i think it's the inserts that take time and not the data fetching. All i know about the server is that it is a quad core xeon @ 2.4 ghz and 16 GB of ram but I do not know anything about storage (will come back as soon as i have that info). And the data is not used for logging only we need to have statitistics like which users were most active on pages, various groupings etc and the user can specify any interval for these.

Upvotes: 0

Views: 307

Answers (3)

exussum
exussum

Reputation: 18576

How are you inserting them ? You can only have a certain number of transactions per second looping querys like this

//start loop
insert into table values (1) 
//end loop

will be MUCH slower than

//start loop
//fill a variable 
//end loop
insert into table values (1),(2),(3),(4) // where (1),(2),(3),(4) are values filled by the loop

(note that you cant insert too many values like this, try it with your data, I usually find 200 or so is a pretty good value)

You dont have a huge amount of index's so i dont think they are cauing problems, phpmyadmin shows the index size, Check what that is compared to the total table size - that may give you an idea of if its storing too much

Upvotes: 0

Ankit Sharma
Ankit Sharma

Reputation: 4069

There are so many factors affecting the insert such as

  • Engine type
  • Indexes
  • how the data is inserted etc.

First of tell us how you are inserting the data, second thing which storage engine is using for table then we can further optimize you insert query, in general I can say unnecessary index means slow insert rate.

For more explanation on the speed of inserts read this article of dev.mysql for insert speed.

Upvotes: 0

duffymo
duffymo

Reputation: 309028

You could:

  1. EXPLAIN PLAN on the queries you run on the 3 other tables to see if they're properly indexed. TABLE SCAN should be eliminated.
  2. Add indexes to those three other tables for each WHERE clause in the queries.
  3. Partition the data by day, week, month, or some other suitable mechanism so you can move the oldest data out to a reporting/warehousing solution.
  4. You could see if a trigger solution could help you.
  5. Profile the database and monitor network traffic to see where the time is being spent.

All relational databases will suffer from having to deal with too much data. Your first thought should not be to drop MySQL; it should be figuring out what your archiving strategy needs to be. You have to decide exactly how much of that data is needed in your transactional store at a given time.

Upvotes: 3

Related Questions