Tinclon
Tinclon

Reputation: 967

MySQL taking forever 'sending data'. Simple query, lots of data

I'm trying to run what I believe to be a simple query on a fairly large dataset, and it's taking a very long time to execute -- it stalls in the "Sending data" state for 3-4 hours or more.

The table looks like this:

CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'

And the query is here:

select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;

The table has approximately 222 million rows, and the where clause in the query filters down to about 20 million rows. The distinct option will bring it down to about 700,000 distinct rows, and then after grouping, (and when the query finally finishes), 4 to 5 rows are actually returned.

I realize that it's a lot of data, but it seems that 4-5 hours is an awfully long time for this query.

Thanks.

Edit: For reference, this is running on AWS on a db.m2.4xlarge RDS database instance.

Upvotes: 2

Views: 9908

Answers (4)

nulll
nulll

Reputation: 1603

Starting from a certain version of MariaDB (maybe since 10.5), I noticed that after importing a dump with

mysql dbname < dump.sql

the optimizer thinks things different from how they are, making the wrong decisions about indexes.

In general even listing tables innodb with phpmyadmin becomes very very slow.

I noticed that running

ANALYZE TABLE myTable;

fixes.

So after each import I run, that it's equal to run ANALYZE on each table

mysqlcheck -aA

Upvotes: 0

newtover
newtover

Reputation: 32094

This is a really heavy query. To understand why it takes so long you should understand the details.

You have a range condition on the indexed field, that is MySQL finds the smallest created value in the index and for each value it gets the corresponding primary key from the index, retrieves the row from disk, and fetches the required fields (protocol, userId) missing in the current index record, puts them in a "temporary table", making the groupings on those 700000 rows. The index can actually be used and is used here only for speeding up the range condition.

The only way to speed it up, is to have an index that contains all the necessary data, so that MySQL would not need to make on disk lookups for the rows. That is called a covering index. But you should understand that the index will reside in memory and will contain ~ sizeOf(created+protocol+userId+PK)*rowCount bytes, that may become a burden as itself for the queries that update the table and for other indexes. It is easier to create a separate aggregates table and periodically update the table using your query.

Upvotes: 3

Mahesh Patil
Mahesh Patil

Reputation: 1551

Why don't you profile a query and see what exactly is happening?

SET PROFILING = 1; 
SET profiling_history_size = 0; 
SET profiling_history_size = 15; 
/* Your query should be here */
SHOW PROFILES; 
SELECT state, ROUND(SUM(duration),5) AS `duration (summed) in sec` FROM information_schema.profiling WHERE query_id = 3 GROUP BY state ORDER BY `duration (summed) in sec` DESC; 
SET PROFILING = 0; 
EXPLAIN /* Your query again should appear here */;

I think this will help you in seeing where exactly query takes time and based on result you can perform optimization operations.

Upvotes: 11

idstam
idstam

Reputation: 2878

Both distinct and group by will need to sort and store temporary data on the server. With that much data that might take a while.

Indexing different combinations of userId, created and protocol will help, but I can't say how much or what index will help the most.

Upvotes: 1

Related Questions