Malith Wijethunga
Malith Wijethunga

Reputation: 138

MySQL Query Optimization for JOIN Large Tables

I have a problem with my MySQL query with large data access, When the query optimized with join it gives the output within 122 seconds for the data of one week. Then for one month data it takes 526 seconds for the process. I want to optimize this query for less amount of process time per year or if there any way to optimize MySQL settings in general ?

Table details. I refer two tables which mdiaries and tv_diaries,In both tables I have indexed relevant columns, In mdiaries table there are 2661331 rows and 27074645 rows in tv_diaries.

mdiaries table:

  INDEX area (area),
  INDEX date (date),
  INDEX district (district),
  INDEX gaDivision (gaDivision),
  INDEX member_id (member_id),
  INDEX tv_channel_id (tv_channel_id),

tv_diaries.

  INDEX area (area),
  INDEX date (date),
  INDEX district (district),
  INDEX member_id (member_id),
  INDEX timeslot_id (timeslot_id),
  INDEX tv_channel_id (tv_channel_id),

This is my query which takes 122 seconds to execute.

$sql = "SELECT COUNT(TvDiary.id) AS m_count,TvDiary.date,TvDiary.timeslot_id,TvDiary.tv_channel_id,TvDiary.district,TvDiary.area
FROM `mdiaries` AS Mdiary INNER JOIN `tv_diaries` AS TvDiary ON Mdiary.member_id = TvDiary.member_id
WHERE Mdiary.date >= '2014-01-01' AND Mdiary.date <= '2014-01-07'
AND TvDiary.date >= '2014-01-01' AND TvDiary.date <= '2014-01-07'
GROUP BY TvDiary.date,
TvDiary.timeslot_id,
TvDiary.tv_channel_id,
TvDiary.district,
TvDiary.area";

This is my.cnf file.

    [mysqld]

## General
datadir                         = /var/lib/mysql
tmpdir                          = /var/lib/mysqltmp
socket                          = /var/lib/mysql/mysql.sock
skip-name-resolve
sql-mode                        = NO_ENGINE_SUBSTITUTION
#event-scheduler                = 1

## Networking
back-log                        = 100
#max-connections                = 200
max-connect-errors              = 10000
max-allowed-packet              = 32M
interactive-timeout             = 3600
wait-timeout                    = 600

### Storage Engines
#default-storage-engine         = InnoDB
innodb                          = FORCE

## MyISAM
key-buffer-size                 = 64M
myisam-sort-buffer-size         = 128M

## InnoDB
innodb-buffer-pool-size        = 16G
innodb_buffer_pool_instances    = 16
#innodb-log-file-size           = 100M
#innodb-log-buffer-size         = 8M
#innodb-file-per-table          = 1
#innodb-open-files              = 300

## Replication
server-id                       = 1
#log-bin                        = /var/log/mysql/bin-log
#relay-log                      = /var/log/mysql/relay-log
relay-log-space-limit           = 16G
expire-logs-days                = 7
#read-only                      = 1
#sync-binlog                    = 1
#log-slave-updates              = 1
#binlog-format                  = STATEMENT
#auto-increment-offset          = 1
#auto-increment-increment       = 2

## Logging
log-output                      = FILE
slow-query-log                  = 1
slow-query-log-file             = /var/log/mysql/slow-log
#log-slow-slave-statements
long-query-time                 = 2

##
query_cache_size        = 512M
query_cache_type        = 1
query_cache_limit       = 2M
join_buffer_size        = 512M
thread_cache_size       = 128

[mysqld_safe]
log-error                       = /var/log/mysqld.log
open-files-limit                = 65535

[mysql]
no-auto-rehash

Upvotes: 3

Views: 3261

Answers (4)

Zafar Malik
Zafar Malik

Reputation: 6854

Not sure but it can provide you better performance-

SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area
FROM `mdiaries` m 
JOIN 
(
SELECT t.id, t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area, t.member_id 
FROM `tv_diaries` AS t
WHERE t.date >= '2014-01-01' AND t.date <= '2014-01-07' 
) t ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07' 
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;

You can also check your db config setting as I am seeing below issues-

  1. innodb_file_per_table=1 is commented: if it is true then data will be stored in single ibd file instead of table wise.

  2. tmp_table_size and max_heap_table_size can improve performance as you are trying to fetch data from heavy tables. so try to set both of them as at least 100M to avoid temp table creation on disk if your query is creating temp table on disk.

  3. as you are using group by, so sort_buffer_size variable can help if you increase it. can set 2M.

  4. join_buffer_size is too high it should be near about 2M can set max. 8M but not 512M as it used session wise so eat all your memory.

  5. also you have set query_cache_size too high as 512M, so free memory from here, you can also check by mysqltuner report that actually you are getting benefit of caching query or not if not then you can disable it.

Upvotes: 1

Mr. E
Mr. E

Reputation: 2120

Maybe you could use a materialized view to store the result of the query and refresh it periodically (monthly? 15 days?)

This will not optimize your query but your consults will be way faster (It won't calculate again the count)

Upvotes: 0

Roman Hocke
Roman Hocke

Reputation: 4239

Try adding multiple-columns index on all columns referenced in GROUP BY clause, as mentioned in the documentation.

INDEX grp (date, timeslot_id, tv_channel_id, district, area)

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270401

This is your query:

SELECT COUNT(t.id) AS m_count, t.date, t.timeslot_id, t.tv_channel_id,
       t.district, t.area
FROM `mdiaries` m INNER JOIN
     `tv_diaries` t
     ON m.member_id = t.member_id
WHERE m.date >= '2014-01-01' AND m.date <= '2014-01-07' AND
      t.date >= '2014-01-01' AND t.date <= '2014-01-07'
GROUP BY t.date, t.timeslot_id, t.tv_channel_id, t.district, t.area;

I would start with composite indexes: tv_diaries(date, member_id) and mdiaries(member_id, date).

This query is problematic, but these might help.

Upvotes: 2

Related Questions