Reputation: 143
I have a query that runs really slow (15 20 seconds) when is not on memory and quite fast when is on memory (2s - 0.6s)
select count(distinct(concat(conexiones.tMacAdres,date_format(conexiones.fFecha,'%Y%m%d')))) as Conexiones,
sum(if(conexiones.tEvento='megusta',1,0)) as MeGusta,sum(if(conexiones.tEvento='megusta',conexiones.nAmigos,0)) as ImpactosMeGusta,
sum(if(conexiones.tEvento='checkin',1,0)) as CheckIn,sum(if(conexiones.tEvento='checkin',conexiones.nAmigos,0)) as ImpactosCheckIn,
min(conexiones.fFecha) Fecha_Inicio, now() Fecha_fin,datediff(now(),min(conexiones.fFecha)) as dias
from conexiones, instalaciones
where conexiones.idInstalacion=instalaciones.idInstalacion and conexiones.idInstalacion=190
and (fFecha between '2014-01-01 00:00:00' and '2016-06-18 23:59:59')
group by instalaciones.tNombre
order by instalaciones.idCliente
This is Table SCHEMAS: Instalaciones with 1332 rows:
CREATE TABLE `instalaciones` (
`idInstalacion` int(10) unsigned NOT NULL AUTO_INCREMENT,
`idCliente` int(10) unsigned DEFAULT NULL,
`tRouterSerial` varchar(50) DEFAULT NULL,
`tFacebookPage` varchar(256) DEFAULT NULL,
`tidFacebook` varchar(64) DEFAULT NULL,
`tNombre` varchar(128) DEFAULT NULL,
`tMensaje` varchar(128) DEFAULT NULL,
`tWebPage` varchar(128) DEFAULT NULL,
`tDireccion` varchar(128) DEFAULT NULL,
`tPoblacion` varchar(128) DEFAULT NULL,
`tProvincia` varchar(64) DEFAULT NULL,
`tCodigoPosta` varchar(8) DEFAULT NULL,
`tLatitud` decimal(15,12) DEFAULT NULL,
`tLongitud` decimal(15,12) DEFAULT NULL,
`tSSID1` varchar(40) DEFAULT NULL,
`tSSID2` varchar(40) DEFAULT NULL,
`tSSID2_Pass` varchar(40) DEFAULT NULL,
`fSincro` datetime DEFAULT NULL,
`tEstado` varchar(10) DEFAULT NULL,
`tHotspot` varchar(10) DEFAULT NULL,
`fAlta` datetime DEFAULT NULL,
PRIMARY KEY (`idInstalacion`),
UNIQUE KEY `tRouterSerial` (`tRouterSerial`),
KEY `idInstalacion` (`idInstalacion`)
) ENGINE=InnoDB AUTO_INCREMENT=1332 DEFAULT CHARSET=utf8;
Conexiones with 2370365 rows
CREATE TABLE `conexiones` (
`idConexion` int(10) unsigned NOT NULL AUTO_INCREMENT,
`idInstalacion` int(10) unsigned DEFAULT NULL,
`idUsuario` int(11) DEFAULT NULL,
`tMacAdres` varchar(64) DEFAULT NULL,
`tUsuario` varchar(128) DEFAULT NULL,
`tNombre` varchar(64) DEFAULT NULL,
`tApellido` varchar(64) DEFAULT NULL,
`tEmail` varchar(64) DEFAULT NULL,
`tSexo` varchar(20) DEFAULT NULL,
`fNacimiento` date DEFAULT NULL,
`nAmigos` int(11) DEFAULT NULL,
`tPoblacion` varchar(64) DEFAULT NULL,
`fFecha` datetime DEFAULT NULL,
`tEvento` varchar(20) DEFAULT NULL,
PRIMARY KEY (`idConexion`),
KEY `idInstalacion` (`idInstalacion`),
KEY `tMacAdress` (`tMacAdres`) USING BTREE,
KEY `fFecha` (`fFecha`),
KEY `idUsuario` (`idUsuario`),
KEY `insta_fecha` (`idInstalacion`,`fFecha`)
) ENGINE=InnoDB AUTO_INCREMENT=2370365 DEFAULT CHARSET=utf8;
This is EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE instalaciones const PRIMARY,idInstalacion PRIMARY 4 const 1
1 SIMPLE conexiones ref idInstalacion,fFecha,insta_fecha idInstalacion 5 const 110234 "Using where"
Thanks !
(Edited)
SHOW TABLE STATUS LIKE 'conexiones'
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
conexiones InnoDB 10 Compact 2305296 151 350060544 0 331661312 75497472 2433305 28/06/2016 22:26 NULL NULL utf8_general_ci NULL
Upvotes: 0
Views: 78
Reputation: 142298
Here's why it is so slow. And I will end with a possible speedup.
First, please do
SELECT COUNT(*) FROM conexiones
WHERE idInstalacion=190
and fFecha >= '2014-01-01'
and fFecha < '2016-06-19
in order to see how many rows we are dealing with. The EXPLAIN suggests 110234, but that is only a crude estimate.
Assuming there are 110K rows of conexiones
involved in the query, and assuming the rows were (approximately) inserted in chronological order by fFecha
, then...
Let's further check on my last claim... How much RAM do you have? What is the value of innodb_buffer_pool_size
? It should be about 70% of available RAM. Use a lower percentage if you have less than 4GB of RAM.
Assuming that conexiones
is too big to be 'cached' in the 'buffer_pool', we need to find a way to decrease the I/O.
There are 1332 different values for idInstalacion
. Perhaps you insert 1332 rows every few minutes/hours into conexiones
? Since the PRIMARY KEY
merely an AUTO_INCREMENT
, those rows will be 'appended' to the end of the table.
Now let's look at where the idInstalacion=190
rows are. A new one of them occurs every 1332 (or so) rows. That means they are spread out. It means that (probably) no two rows are in the same block (16KB in InnoDB). That means that the 110234 will be in 110234 different blocks. That's about 2GB. If the buffer_pool is smaller than that, then there will be I/O. Even if it is bigger than that, that's a lot of data to touch.
But what to do about it? If we could arrange the =190
rows to be consecutive in the table, then the 2GB might drop to, say, 20MB -- a much more manageable and cacheable size. But how can that be done? By changing the PRIMARY KEY
.
PRIMARY KEY(idInstalacion, fFecha, idConexion),
INDEX(idConexion)
and DROP
any other indexes starting with idInstalacion
or idConexion
. To explain:
idInstalacion=190
rows over any consecutive fFetcha
range will be consecutive in the data. So, fetching one block will get about 100 rows -- much less I/O.(idInstalacion, fFecha)
is not unique, I tacked on idConexion
to make it unique.INDEX(idConexion)
to make AUTO_INCREMENT
happy.Potential drawback... Since this change rearranges the order of the data, other queries, including the INSERTs
may be slowed down. The INSERTs
will be scattered, but not really slowed down. 1332 "hots spots" would be accepting the new rows; that many blocks can easily be cached.
Arithmetic... If you have spinning drives, I would expect the existing structure to take about 1102 seconds (perhaps under 110 seconds for SSD) for 110234 rows. Since it is taking under 20 seconds, I suspect there is some caching (or you have SSDs) or the 110234 is grossly overestimated. My suggested change should decrease the "worst" time significantly, and slightly improve the "in memory" time. This "slight improvement" comes from being able to use the PK instead of a secondary key.
Caveat: Since 110234 * 1332 is nowhere near 2370365, much of my numerical analysis is probably nowhere near correct. For example, 2370365 rows with that schema is possible less than 1GB. Please provide SHOW TABLE STATUS LIKE 'conexiones'
.
Addenda
"server has 2GB Ram and innodb_buffer_pool_size is 5368709120" -- Either that is a typo or it is terrible. Since the buffer_pool needs to reside in RAM, do not set the buffer_pool to 5GB. 500MB might be OK for your tiny 2GB of RAM.
The SHOW TABLE STATUS
confirms that it (data + indexes) won't quite fit in 500M, so you may periodically experience I/O bound queries with 500M.
Increasing your RAM and buffer_pool would temporarily (until the data gets bigger) help performance.
Before putting this into production, test the ALTER
and time the various queries you use:
ALTER TABLE conexiones
DROP PRIMARY KEY,
DROP INDEX insta_fecha,
DROP INDEX idInstalacion,
PRIMARY KEY(idInstalacion, fFecha, idConexion),
INDEX(idConexion)
Caution: The ALTER
will need about 1GB of free disk space.
When timing, run with the Query Cache off, and run twice -- the first may involve I/O; the second is the 'in memory' as you mentioned.
Revised analysis: Since the bigger table has 300MB of data and some amount of indexes in use, and assuming 500MB buffer pool, I suspect that blocks are bumped out of the buffer pool some of the time. This fits well with your initial comment on the query's speed. My suggested index changes should help avoid the speed variance, but may hurt the performance of other queries.
Upvotes: 1
Reputation: 31792
Try the following:
Either delete the idInstalacion
INDEX or tell the engine to use the correct key in the from clause:
from conexiones use index (insta_fecha), instalaciones
And you don't need to JOIN, GROUP or ORDER. You are joining on a constant value (190) with one row. And you don't use any column from instalaciones
.
So all you need is this:
select count(distinct(concat(conexiones.tMacAdres,date_format(conexiones.fFecha,'%Y%m%d')))) as Conexiones,
sum(if(conexiones.tEvento='megusta',1,0)) as MeGusta,sum(if(conexiones.tEvento='megusta',conexiones.nAmigos,0)) as ImpactosMeGusta,
sum(if(conexiones.tEvento='checkin',1,0)) as CheckIn,sum(if(conexiones.tEvento='checkin',conexiones.nAmigos,0)) as ImpactosCheckIn,
min(conexiones.fFecha) Fecha_Inicio, now() Fecha_fin,datediff(now(),min(conexiones.fFecha)) as dias
from conexiones -- use index (insta_fecha)
where conexiones.idInstalacion=190
and (fFecha between '2014-01-01 00:00:00' and '2016-06-18 23:59:59')
However - it doesn't mean it will be faster. MySQL will probably optimize all that stuff away.
Upvotes: 0
Reputation: 6084
Try to use a multi column index:
CREATE idx_nn_1 ON conexiones(idInstalacion,fFecha);
You might need to have it the other way around depending on the data, so test both. This avoids reading all the records for between condition on fFecha
matching the idInstalacion
condition, and should improve performance.
Upvotes: 0