Alexis Espósito
Alexis Espósito

Reputation: 23

Performance challenge on MySQL RDS with simultaneous queries

I have an app developed over NodeJS on AWS that has a MySQL RDS database (server class: db.r3.large - Engine: InnoDB) associated. We are having a performance problem, when we execute simultaneous queries (at the same time), the database is returning the results after finishing the last query and not after each query is finished.

So, as an example: if we execute a process that has 10 simultaneous queries of 3 seconds each, we start receiving the results at approximately 30 seconds and we want to start receiving when the first query is finished (3 seconds).

It seems that the database is receiving the queries and make a queue of them.

I'm kind of lost here since I changed several things (separate connections, pool connections, etc) of the code and the settings of AWS but doesn’t seem to improve the result.

TableA (13M records) schema:

CREATE TABLE `TableA` (
  `columnA` int(11) NOT NULL AUTO_INCREMENT,
  `columnB` varchar(20) DEFAULT NULL,
  `columnC` varchar(15) DEFAULT NULL,
  `columnD` varchar(20) DEFAULT NULL,
  `columnE` varchar(255) DEFAULT NULL,
  `columnF` varchar(255) DEFAULT NULL,
  `columnG` varchar(255) DEFAULT NULL,
  `columnH` varchar(10) DEFAULT NULL,
  `columnI` bigint(11) DEFAULT NULL,
  `columnJ` bigint(11) DEFAULT NULL,
  `columnK` varchar(5) DEFAULT NULL,
  `columnL` varchar(50) DEFAULT NULL,
  `columnM` varchar(20) DEFAULT NULL,
  `columnN` int(1) DEFAULT NULL,
  `columnO` int(1) DEFAULT '0',
  `columnP` datetime NOT NULL,
  `columnQ` datetime NOT NULL,
  PRIMARY KEY (`columnA`),
  KEY `columnB` (`columnB`),
  KEY `columnO` (`columnO`),
  KEY `columnK` (`columnK`),
  KEY `columnN` (`columnN`),
  FULLTEXT KEY `columnE` (`columnE`)
) ENGINE=InnoDB AUTO_INCREMENT=13867504 DEFAULT CHARSET=utf8;

TableB (15M records) schema:

CREATE TABLE `TableB` (
  `columnA` int(11) NOT NULL AUTO_INCREMENT,
  `columnB` varchar(50) DEFAULT NULL,
  `columnC` varchar(50) DEFAULT NULL,
  `columnD` int(1) DEFAULT NULL,
  `columnE` datetime NOT NULL,
  `columnF` datetime NOT NULL,
  PRIMARY KEY (`columnA`),
  KEY `columnB` (`columnB`),
  KEY `columnC` (`columnC`)
) ENGINE=InnoDB AUTO_INCREMENT=19153275 DEFAULT CHARSET=utf8;

Query:

SELECT COUNT(*) AS total
FROM TableA 
WHERE TableA.columnB IN (  
    SELECT TableB.columnC
    FROM TableB  
    WHERE TableB.columnB = "3764301"  
    AND TableB.columnC NOT IN (   
        SELECT field   
        FROM table   
        WHERE table.field = 10
    AND TableB.columnC NOT IN (    
        SELECT field   
        FROM table   
        WHERE table.field = 10 
    AND TableB.columnC NOT IN (    
        SELECT field   
        FROM table   
        WHERE table.field = 10
    AND TableB.columnC NOT IN (   
        SELECT field   
        FROM table   
        WHERE table.field = 10
)
AND columnM > 2;

To see that queries are running I'm using "SHOW FULL PROCESSLIST" and the queries are most of the time with state "sending data".

It is not a performance issue about the query, it is a problem of recurrence over database. Even a very simple query like "SELECT COUNT(*) FROM TableA WHERE columnM = 5" has the same problem.

UPDATE

Only for testing purpose I reduce the query to only one subquery condition. Both results have 65k records.

-- USING IN
SELECT COUNT(*) as total 
FROM TableA 
WHERE TableA.columnB IN (  
    SELECT TableB.columnC  
       FROM TableB
       WHERE TableB.columnB = "103550181"  
       AND TableB.columnC NOT IN (   
           SELECT field   
           FROM tableX   
           WHERE fieldX = 15  
    )
) 
AND columnM > 2;

-- USING EXISTS
SELECT COUNT(*) as total 
FROM TableA 
WHERE EXISTS (  
    SELECT *  
    FROM TableB  
    WHERE TableB.columnB = "103550181"  
    AND TableA.columnB = TableB.columnC
    AND NOT EXISTS (   
        SELECT *
        FROM tableX
        WHERE fieldX = 15
        AND fieldY = TableB.columnC
   )
) 
AND columnM > 2;

-- Result
Query using IN : 1.7 sec
Query using EXISTS : 141 sec (:O)

Using IN or EXISTS the problem is the same, when I execute many times this query the data base have a delay and the response comes after a lot of time. Example: If one query response in 1.7 sec, if I execute 10 times this query, the first result is in 20 sec.

Upvotes: 1

Views: 288

Answers (1)

Rick James
Rick James

Reputation: 142453

Recommendation 1

Change the NOT IN ( SELECT ... ) to NOT EXISTS ( SELECT * ... ). (And you may need to change the WHERE clause a bit.

AND TableB.columnC NOT IN (    
    SELECT field   
    FROM table   
    WHERE table.field = 10 

-->

AND NOT EXISTS ( SELECT * FROM table WHERE field = TableB.columnC )

table needs an index on field.

IN ( SELECT ... ) performs very poorly. EXISTS is much better optimized.

Recommendation 2

To deal with the concurrency, consider doing SET SESSION TRANSACTION READ UNCOMMITTED before the query. This may keep one connection from interfering with another.

Recommendation 3

Show us the EXPLAIN, the indexes (SHOW CREATE TABLE) (what you gave is not sufficient), and the WHERE clauses so we can critique the indexes.

Recommendation 4

It might help for TableB to have a composite INDEX(ColumnB, ColumnC) in that order.

Upvotes: 1

Related Questions