bopritchard
bopritchard

Reputation: 399

MySQL Query/Table in need of optimization

I have a query that is taking an embarrassingly long time. ~7 minutes embarrassing. I would really appreciate some help. Missing indexes? Rewrite the query? All of the above?

Many thanks

mysql Ver 14.14 Distrib 5.7.25, for Linux (x86_64)

The query looks like:

SELECT COUNT(*) AS count_all, name 
FROM api_events ae
INNER JOIN products p on p.token=ae.product_token 
WHERE (ae.created_at > '2019-01-21 12:16:53.853732') 
GROUP BY name

Here are the two table definitions

api_events has ~31 million records

CREATE TABLE `api_events` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `api_name` varchar(200) NOT NULL,
  `hostname` varchar(200) NOT NULL,
  `controller_action` varchar(2000) NOT NULL,
  `duration` decimal(12,5) NOT NULL DEFAULT '0.00000',
  `view` decimal(12,5) NOT NULL DEFAULT '0.00000',
  `db` decimal(12,5) NOT NULL DEFAULT '0.00000',
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `product_token` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `product_token` (`product_token`)
) ENGINE=InnoDB AUTO_INCREMENT=64851218 DEFAULT CHARSET=latin1;

and products has only 12 records

CREATE TABLE `products` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `code` varchar(30) NOT NULL,
  `name` varchar(100) NOT NULL,
  `description` varchar(2000) NOT NULL,
  `token` varchar(50) NOT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=19 DEFAULT CHARSET=latin1;

Upvotes: 0

Views: 82

Answers (4)

Rick James
Rick James

Reputation: 142540

For the query as stated, you needed

api_events:  INDEX(created_at, product_token)
products:    INDEX(token, name)

Because the WHERE mentions api_events, the Optimizer is likely to start with that table. created_at is in the WHERE, so the index starts with that, even though starting with a 'range' is usually wrong. In this case, the pair is "covering".

Then, INDEX(token, name) is also "covering".

"Covering" indexes give a small, but widely varying, amount of performance improvement.

Upvotes: 1

ScaisEdge
ScaisEdge

Reputation: 133400

You could improve the join performance adding index

create index  idx1 on api_events(product_token, created_at);

create index  idx2 on products(token);

You could also trying inverting the columns ofr api_events

 create index  idx1 on api_events(created_at, product_token); 

and trying add redundancy to product index

create index  idx2 on products(token, name);

Upvotes: 1

spencer7593
spencer7593

Reputation: 108510

It seems like the criteria on created_at is very selective (looking at only the past 7 days?). That's crying out to explore an index with created_at as a leading column.

The query is also referencing the product_token column from the same table, so we can include that column in the index, to make it a covering index.

  api_events_IX ON api_events ( created_at, product_token )

Using that index, we can probably avoid looking at the vast majority of the 31 million rows, and quickly narrow in on the subset of rows we actually need to look at.

Using the index, the query will still need a "Using filesort" operation to satisfy the GROUP BY.

(My guess here is that the join to the 12 rows in product doesn't exclude a lot of rows... that on the vast majority of rows in api_event the product_token refers to a row that exists in product.


Use MySQL EXPLAIN to see the query execution plan.


A further possible refinement (to test the performance of) would be to do some of the aggregation in an inline view:

 SELECT SUM(s.count_all) AS count_all
      , p.name
   FROM ( SELECT COUNT(*) AS count_all
               , ae.product_token
            FROM api_events ae
           WHERE ae.created_at > '2019-01-21 12:16:53.853732'
           GROUP
              BY ae.product_token
        ) s
   JOIN products p 
     ON p.token = s.product_token 
  GROUP
     BY p.name

If the assumption about product_token is misinformed, if there are lots of rows in api_event that have product_token values that don't reference a row in product ... we might take a different tack ...

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271151

What happens if you group by the token instead of the name?

SELECT ae.product_token, COUNT(*) AS count_all 
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY ae.product_token;

For this query, an index on api_events(created_at, product_token) will probably help.

If this is faster, then you can bring in the name using a subquery.

Upvotes: 0

Related Questions