user3092543
user3092543

Reputation: 51

MySQL select not in is really slow

This is what I have now.

SELECT distinct ClientID 
    FROM Table
    WHERE PAmt = '' and ClientID not in 
        (select distinct ClientID from Table where PAmt != '') 
    ORDER BY ID ASC

ClientID can be inside the Table more then once and some of them have PAmt value some don't. I am trying to get only the clientid's that never had a PAmt value. The table has about 12000 entry's and only 2700 are unique clientid's

Upvotes: 0

Views: 72

Answers (4)

smurtagh
smurtagh

Reputation: 529

Try using not exists instead of not in.

http://sqlfiddle.com/#!2/249cd5/26

Upvotes: 0

Abhi Beckert
Abhi Beckert

Reputation: 33349

Take away the subquery:

$clientIds = runQuery("select distinct ClientID from Table where PAmt != ''");
$clientIds = implode(",", $clientIds);

runQuery("SELECT distinct ClientID 
FROM Table
WHERE PAmt = '' and ClientID not in 
    ({$clientIds}) 
ORDER BY ID ASC")

I know it looks like MySQL should do that optimisation step for you, but it doesn't. You will find your query is about 12000 times faster if you do the sub query as a separate query.

Upvotes: 0

Eugen Rieck
Eugen Rieck

Reputation: 65264

I think this could be easier solved by

SELECT 
  ClientID,
  MAX(IF(PAmt='',0,1)) AS HasPAmt
FROM `Table`
GROUP BY ClientID
HAVING HasPAmt=0

Edit

Some words on the rationale behind this:

Subqueries in MySQL are a beasty thing: If the resultset is either too large (Original SQL) or intertwined with the driving query (@DavidFleeman's answer), the inner query is looped, i.e. it is repeated for every row of the driving query. This ofcourse gives bad performance.

So we try to reformulate the query in a way, that will avoid looping. My suggestion works by running only two queries: The first (everything before the HAVING) will create a temp table, that marks each distinct ClientID as having at least one non-empty PAmt (or not), the second selects only those rows of the temp table, that are marked as having none, into the final result set.

Upvotes: 3

Iłya Bursov
Iłya Bursov

Reputation: 24146

try to reorganize your query to something like this:

select clientID
from Table
group by clientID
having max(length(PAmt)) == 0

of course you should add index (clientID, PAmt)

if this query will still work slow, add column with pre-calculated length, and replace PAmt with this column

Upvotes: 1

Related Questions