alexfsk
alexfsk

Reputation: 29

SLOW QUERY / IN HAVING Clause

I have a many-to-many relationship database in MySQL

mysql_database_structure

And this Query:

SELECT main_id FROM posts_tag
    WHERE post_id IN ('134','140','187')
    GROUP BY main_id
    HAVING COUNT(DISTINCT post_id) = 3

There are ~5,300,000 rows into this table and that query seems to be slow like 5 seconds (and slower if I add more ids into search)

I want to ask if there is any way to make it faster?

EXPLAIN shows this: mysql_query

By the way, I want to add more conditions like NOT IN and possible JOIN new tables which has same structure but different data. Not so much like this but first I want to know if there is any way to make that simple query faster?

Any advice would be helpful, even another method, or structure etc.

PS: Hardware is Intel Core i9 3.6Ghz, 64GB RAM, 480GB SSD. So I think the server specs is not a problem.

Upvotes: 0

Views: 580

Answers (2)

Rick James
Rick James

Reputation: 142373

Use a "composite" and "covering" index:

INDEX(post_id, main_id)

And get rid of INDEX(post_id) since it will then be redundant.

"Covering" helps speed up a query.

Assuming this is a normal "many-to-many" table, then:

CREATE TABLE post_main (
    post_id -- similar to `id` in table `posts`
    main_id -- similar to `id` in table `main`
    PRIMARY KEY(post_id, main_id),
    INDEX(main_id, post_id)
) ENGINE=InnoDB;

There is no need for AUTO_INCREMENT anywhere in a many-to-many table.

(You could add FK constraints, but I say 'why bother'.)

More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

And NOT IN

This gets a bit tricky. I think this is one way; there may be others.

SELECT  main_id
    FROM post_main
    WHERE post_id IN (244,229,193,93,61)
    GROUP BY main_id AS x
    HAVING COUNT(*) = 5
      AND NOT EXISTS ( SELECT 1
                  FROM post_main
                  WHERE main_id = x.main_id
                    AND post_id IN (92,10,234) );

Upvotes: 1

Wilson Hauck
Wilson Hauck

Reputation: 2343

Alexfsk, your Query on the second line has the IN variables surrounded by single quotes. When your column name is defined as INT or mediumint (or any kind of int) datatype, adding the single quotes around the data causes datatype conversion delays on every row considered and delays completion of your query.

Upvotes: 0

Related Questions