Reputation: 37
my SQL(with sub-queries) take so long(nearly 24hour). Is using sub-queries is not good for performance?
My table as below
mysql> show create table eventnew;
CREATE TABLE `eventnew` (
`id` int(50) NOT NULL AUTO_INCREMENT,
`date` datetime DEFAULT NULL,
`src_ip` int(10) unsigned DEFAULT NULL,
`src_port` int(10) unsigned DEFAULT NULL,
`dst_ip` int(10) unsigned DEFAULT NULL,
`dst_port` int(10) unsigned DEFAULT NULL,
`repo_ip` varchar(50) DEFAULT NULL,
`link` varchar(50) DEFAULT NULL,
`binary_hash` varchar(50) DEFAULT NULL,
`sensor_id` varchar(50) DEFAULT NULL,
`repox_ip` int(10) unsigned DEFAULT NULL,
`flags` varchar(50) DEFAULT NULL,
`shellcode` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`),
KEY `sensor_id` (`sensor_id`),
KEY `src_ip` (`src_ip`)
) ENGINE=MyISAM AUTO_INCREMENT=883278 DEFAULT CHARSET=latin1
my SQL as below:
SELECT COUNT( DISTINCT binary_hash ) AS cnt
FROM eventnew
WHERE DATE >= '2010-10-16'
AND DATE < '2010-10-17'
AND binary_hash NOT
IN (
SELECT DISTINCT binary_hash
FROM eventnew
WHERE DATE < '2010-10-16'
AND binary_hash IS NOT NULL
)
below are result running EXPLAIN:
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| 1 | PRIMARY | eventnew | range | date | date | 9 | NULL | 14296 | Using where |
| 2 | DEPENDENT SUBQUERY | eventnew | range | date | date | 9 | NULL | 384974 | Using where |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
Upvotes: 1
Views: 957
Reputation: 17643
Try this
SELECT COUNT( DISTINCT a.binary_hash ) AS cnt
FROM eventnew a left join eventnew b on (a.binary_hash=b.binary_hash AND b.binary_hash IS NOT NULL AND b.DATE < '2010-10-16')
WHERE a.DATE >= '2010-10-16'
AND a.DATE < '2010-10-17'
and b.date is null
Upvotes: 0
Reputation: 1952
Using subqueries certainly does affect your performance. For instance, lets say a Table T1 has 'n' records and T2 has 'm' records. when you do a join on T1 and T2, it will take n*m records and then will sort them based on your condition. The same case goes with in keyword as well. and if you have another constraint in subquery, it would further decrease the efficiency. However, using subqueries couldn't be avoided in practice as they are meant to be.
Upvotes: 2