Sk1ppeR
Sk1ppeR

Reputation: 387

MySQL subquery in the same table kills performance

Basically I'm trying to create a "suggestions" page based on user interests.

In a accumulative table I store all the products seen by every user. My idea was to select all seen products of all people that have visited products seen by me. I've been trying to come up with a query for awhile now but my best idea was going

Thing is, I think doing it with few queries is not scalable if my table grows. With the subqueries I had the query destroyed my database even though when i used EXPLAIN everything seemed fine (no temp table, no disk hits) yet when I raw the query it took over a minute to execute ... in a table with ~40k rows. Running the queries one by one did what I wanted in under a second so I'm really baffled. Where am I doing a mistake?

The table has the following columns id (PRIMARY), user_id, product_id and bunch of unneeded fields

The following is the SQL query I came up with (that kills my server)

SELECT product_id
FROM user_behavior
WHERE user_id
IN (

    SELECT user_id
    FROM user_behavior
    WHERE user_id <> 43456
    AND product_id
    IN (

        SELECT product_id
        FROM user_behavior
        WHERE user_id =43456
        GROUP BY product_id
    )
    AND offer_city_id
    IN ( 0, 2 )
)

As I said running explain returns the following

| id    | select_type           | table             | type              | possible_keys         | key       | key_len   | ref       | rows      | Extra                     |
|----   |--------------------   |---------------    |----------------   |-------------------    |---------  |---------  |-------    |-------    |-------------------------- |
| 1     | PRIMARY               | user_behavior     | index             | NULL                  | user_id   | 8         | NULL      | 25800     | Using where; using index  |
| 2     | DEPENDENT SUBQUERY    | user_behavior     | index_subquery    | user_id,user_id_2     | user_id   | 4         | func      | 3         | Using where               |
| 3     | DEPENDENT SUBQUERY    | user_behavior     | ref               | user_id,user_id_2     | user_id   | 4         | const     | 76        | Using where; using index  |

EDIT: I am sorry, I can't visualize a table :(

Upvotes: 0

Views: 477

Answers (1)

Rick James
Rick James

Reputation: 142356

Don't use IN ( SELECT ... ).

I am quite lost in what the query is trying to do, but switching to JOIN and EXISTS is likely to be part of the solution. Perhaps something close to this:

SELECT  s.product_id
    FROM  user_behavior AS a
    JOIN  user_behavior AS b  ON b.user_id = a.user_id
    WHERE  EXISTS (
              SELECT  *
                  FROM  user_behavior
                  WHERE  product_id = b.product_id
                    AND  user_id = 43456
                  )
      AND  b.offer_city_id IN ( 0, 2 )
      AND  b.user_id <> 43456 

And include a 'composite' INDEX(user_id, product_id) (in either order)

Or maybe only this is needed??...

SELECT DISTINCT product_id
    FROM  user_behavior AS b
    WHERE  EXISTS (
              SELECT  *
                  FROM  user_behavior
                  WHERE  product_id = b.product_id
                    AND  user_id = 43456
                  )
      AND  offer_city_id IN ( 0, 2 )
      AND  user_id <> 43456 

Well, I hope you get some ideas from these attempts.

Upvotes: 1

Related Questions