cperdana
cperdana

Reputation: 37

mysql sub queries performance

my SQL(with sub-queries) take so long(nearly 24hour). Is using sub-queries is not good for performance?

My table as below

mysql> show create table eventnew;
CREATE TABLE `eventnew` (
  `id` int(50) NOT NULL AUTO_INCREMENT,
  `date` datetime DEFAULT NULL,
  `src_ip` int(10) unsigned DEFAULT NULL,
  `src_port` int(10) unsigned DEFAULT NULL,
  `dst_ip` int(10) unsigned DEFAULT NULL,
  `dst_port` int(10) unsigned DEFAULT NULL,
  `repo_ip` varchar(50) DEFAULT NULL,
  `link` varchar(50) DEFAULT NULL,
  `binary_hash` varchar(50) DEFAULT NULL,
  `sensor_id` varchar(50) DEFAULT NULL,
  `repox_ip` int(10) unsigned DEFAULT NULL,
  `flags` varchar(50) DEFAULT NULL,
  `shellcode` varchar(1000) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `date` (`date`),
  KEY `sensor_id` (`sensor_id`),
  KEY `src_ip` (`src_ip`)
) ENGINE=MyISAM AUTO_INCREMENT=883278 DEFAULT CHARSET=latin1

my SQL as below:

SELECT COUNT( DISTINCT binary_hash ) AS cnt
FROM eventnew
WHERE DATE >=  '2010-10-16'
AND DATE <  '2010-10-17'
AND binary_hash NOT 
IN (

SELECT DISTINCT binary_hash
FROM eventnew
WHERE DATE <  '2010-10-16'
AND binary_hash IS NOT NULL
)

below are result running EXPLAIN:

+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type        | table    | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
|  1 | PRIMARY            | eventnew | range | date          | date | 9       | NULL |  14296 | Using where |
|  2 | DEPENDENT SUBQUERY | eventnew | range | date          | date | 9       | NULL | 384974 | Using where |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+

Upvotes: 1

Views: 957

Answers (3)

Florin Ghita
Florin Ghita

Reputation: 17643

Try this

SELECT COUNT( DISTINCT a.binary_hash ) AS cnt
FROM eventnew a left join eventnew b on (a.binary_hash=b.binary_hash AND b.binary_hash IS NOT NULL AND b.DATE <  '2010-10-16')
WHERE a.DATE >=  '2010-10-16'
AND a.DATE <  '2010-10-17'
and  b.date is null

Upvotes: 0

Ashok
Ashok

Reputation: 1952

Using subqueries certainly does affect your performance. For instance, lets say a Table T1 has 'n' records and T2 has 'm' records. when you do a join on T1 and T2, it will take n*m records and then will sort them based on your condition. The same case goes with in keyword as well. and if you have another constraint in subquery, it would further decrease the efficiency. However, using subqueries couldn't be avoided in practice as they are meant to be.

Upvotes: 2

shinkou
shinkou

Reputation: 5154

I'd suggest you use NOT EXISTS instead of NOT IN.

Upvotes: 0

Related Questions