danneth
danneth

Reputation: 2791

MySQL table index optimization

I'm working with an application that has a MySQL database at Amazon RDS. The table in questions is set up as such:

CREATE TABLE `log` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `timestamp` datetime NOT NULL,
  `username` varchar(45) NOT NULL,
  .. snip some varchar and int fields ..
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This system has been in beta for a while and already the dataset is quite huge and the queries are starting to be rather slow.

SELECT COUNT(*) FROM log --> 16307224 (takes 105 seconds to complete)

This table is pretty much only used to build one report off a query like this

SELECT timestamp, username, [a few more] FROM log 
WHERE timestamp  BETWEEN '2012-03-30 08:00:00' AND '2012-03-30 16:00:00' 
AND username='XX' 

Which typically will give something between 1000 and 6000 rows taking around 100-180 sec to complete, meaning the web application will often time out and leave an empty report (I will look in to the timeout as well, but this question is for the root cause).

I'm not very good with databases, but my guess is that it's the BETWEEN that's killing me here. What I'm thinking is that I should perhaps somehow use the timestamp as index. Timestamp togethere with username should still provide uniqueness (I don't use the id field for anything).

If there's anyone out there with suggestions for optimizations I'm all ears.

UPDATE:

Table is now altered to the following

CREATE TABLE `log` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `timestamp` datetime NOT NULL,
  `username` varchar(45) NOT NULL,
  .. snip ..
  `task_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_un_ts` (`timestamp`,`username`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

EXPLAIN of the SELECT statement returns the following

id => 1
select_type => SIMPLE
table => log
type => range
possible_keys => index_un_ts
key => index_un_ts
key_len => 55
ref => 
rows => 52258
Extra => Using where; Using index

Upvotes: 0

Views: 1882

Answers (1)

Namphibian
Namphibian

Reputation: 12221

Well a index on the timestamp column and userid would be helpful. You need to be able to read the output of a EXPLAIN Statement.

Go to MySQL and do the Following:

EXPLAIN SELECT timestamp, username, [a few more] FROM log 
WHERE timestamp  BETWEEN '2012-03-30 08:00:00' AND '2012-03-30 16:00:00' 
AND username='XX' 

This show you the plan MySQL uses to execute the query. There will be column called key. This indicates what index MySQL is using in the query. I suspect you will see ALL there which means MySQL is scanning the table from top to bottom matching every row against your where clause. Now create a index on the timestamp and userid columns. Run the EXPLAIN statement again. You should see the index that you created in the key column.

If MySQL uses the index then your query should be considerably faster. Just remember not to over index. Indexes make inserts, updates and deletes slower. When you insert a new row into a table and there is three indexes on the table the new row has to write 3 values to the three different indexes. So it is a double edged sword.

Upvotes: 1

Related Questions