Optimize and speed up MySQL query selection

Question

I'm trying to figure out which is the best way to optimize my current selection query on a MySQL database.

I have 2 MySQL tables with a relationship one-to-many. One is the user table that contains the unique list of users and It has around 22krows. One is the linedata table which contains all the possible coordinates for each user and it has around 490k rows.

In this case we can assume the foreign key between the 2 tables is the id value. In the case of the user table the id is also the auto-increment primary key, while in the linedata table it's not primary key cause we can have more rows for the same user.

The CREATE STMT structure

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `isActive` tinyint(4) NOT NULL,
  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `gender` varchar(45) COLLATE utf8_unicode_ci NOT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=21938 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


CREATE TABLE `linedata` (
  `id` int(11) NOT NULL,
  `userId` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `timestamp` datetime NOT NULL,
  `x` float NOT NULL,
  `y` float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The selection query

SELECT 
        u.id, 
        u.isActive, 
        u.userId,
        u.name,
        u.gender,
        u.age,
        GROUP_CONCAT(CONCAT_WS(', ',timestamp,x, y)
                     ORDER BY timestamp ASC SEPARATOR '; '
                    ) as linedata_0

        FROM user u 
        JOIN linedata l
        ON u.id=l.id
        WHERE DATEDIFF(l.timestamp, '2018-02-28T20:00:00.000Z') >= 0
          AND DATEDIFF(l.timestamp, '2018-11-20T09:20:08.218Z') <= 0
        GROUP BY userId;

The EXPLAIN output

+-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
    |   ID  |   SELECT_TYPE |   TABLE   |   TYPE    |   POSSIBLE_KEYS   |   KEY     |   KEY_LEN     |   REF     |   ROWS    |       EXTRA                                                |
    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
    |   1   |   SIMPLE      |   l      |   ALL   |   NULL         |   NULL |      NULL        |   NULL    |   491157   |   "Using where; Using temporary; Using filesort" |
    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+
    |   1   |   SIMPLE      |   u      |   eq_ref  |   PRIMARY         |   PRIMARY |      4        |   l.id   |   1       |     NULL                                                   |
    +-------+---------------+-----------+-----------+-------------------+-----------+---------------+-----------+-----------+------------------------------------------------------------+

The selection query works if for example I add another WHERE condition for filter single users. Let's say that I want to select just 200 user, then I got around 14 seconds as execution time. Around 7 seconds if I select just the first 100 users. But in case of having only datetime range condition it seems loading without an ending point. Any suggestions?

UPDATE

After following the Rick's suggestions now the query benchmark is around 14 seconds. Here below the EXPLAIN EXTENDED:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra 1,PRIMARY,u,index,PRIMARY,PRIMARY,4,NULL,21959,100.00,NULL 1,PRIMARY,l,ref,id_timestamp_index,id_timestamp_index,4,u.id,14,100.00,"Using index condition" 2,"DEPENDENT SUBQUERY",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,"No tables used"

I have changed a bit some values of the tables:

Where the id in user table can be joined with userId in linedata table. And they are integer now. We will have string type just for the userId value in user table cause it is a sort of long string identifier like 0000309ab2912b2fd34350d7e6c079846bb6c5e1f97d3ccb053d15061433e77a_0.

So, just for make a quick example we will have in user and in linedata table:

+-------+-----------+-----------+-------------------+--------+---+
|   id  | isActive  |   userId  |       name        | gender |age|
+-------+-----------+-----------+-------------------+--------+---+
|   1   |   1       |  x4by4d   |   john            | m      | 22|
|   2   |   1       |  3ub3ub   |   bob             | m      | 50|
+-------+-----------+-----------+-------------------+--------+---+



+-------+-----------+-----------+------+---+
|   id  | userId    |timestamp  |  x   | y |
+-------+-----------+-----------+------+----+
|   1   |   1       | somedate  |  30  | 10 |
|   2   |   1       | somedate  |  45  | 15 |
|   3   |   1       | somedate  |  50  | 20 |
|   4   |   2       | somedate  |  20  |  5 |
|   5   |   2       | somedate  |  25  | 10 |
+-------+-----------+-----------+------+----+

I have added a compound index made of userId and timestamp values in linedata table.

Maybe instead of having as primary key an ai id value for linedata table, if I add a composite primary key made of userId+timestamp? Should increase the performance or maybe not?

Optimize and speed up MySQL query selection

The CREATE STMT structure

The selection query

The EXPLAIN output

UPDATE

Answers (1)

Related Questions