Fox Alex
Fox Alex

Reputation: 133

MySQL right join slow performance

I have two tables:

Restaurants and SurveyInvitation.

One restaurants has many survey invitation.

I want to select all the restaurants that have a survey invitation and the status of it is 'approved', 'completed', 'hidden_review' .

The restaurants table has ~1400 rows and the survey invitation ~2.4 milion rows.

This is my query

SELECT  `Restaurant`.`id`
FROM   `restaurants` AS `Restaurant`
RIGHT JOIN `survey_invitations` AS `SurveyInvitations`
           ON ( `SurveyInvitations`.`restaurant_id` = `Restaurant`.`id`
                AND `SurveyInvitations`.`status` 
                IN (
                    'approved', 'completed', 'hidden_review'
                   ) 
               )
WHERE  `Restaurant`.`country_id` = 53
AND `Restaurant`.`area_id` IN ( 1, 16, 27, 118,
                                   219, 221, 222, 223,
                                   224, 225, 230, 231,
                                   235, 236, 237, 238,
                                   239, 240, 248, 226,
                                   241, 244, 246, 227,
                                   245, 228, 229, 242,
                                   243, 249 )

group by `Restaurant`.`id`

This runs in 1.235 sec.

Running explain gives

https://jsfiddle.net/bjuepb9j/3

I also tried this but no luck still 1.2 sec

SELECT  `Restaurant`.`id`
FROM   `db_portal`.`restaurants` AS `Restaurant`
RIGHT JOIN  (
    select `restaurant_id` from `survey_invitations` AS `SurveyInvitations`
    where `SurveyInvitations`.`status` 
    IN ('approved', 'hidden_review', 'completed')
)  AS `SurveyInvitations`
ON (
`SurveyInvitations`.`restaurant_id` = `Restaurant`.`id`
)
WHERE  `Restaurant`.`country_id` = 53
AND `Restaurant`.`area_id` IN ( 1, 16, 27, 118,
                                   219, 221, 222, 223,
                                   224, 225, 230, 231,
                                   235, 236, 237, 238,
                                   239, 240, 248, 226,
                                   241, 244, 246, 227,
                                   245, 228, 229, 242,
                                   243, 249 )

group by `Restaurant`.`id`

Explain is the same.

In the fiddle there is also the result from show index on both of the tables.

1.2 sec for a ~2.4 million rows is to much I think. Maybe the indexes are wrong, I'm not that good at this kind of stuff.

Edit.1. https://jsfiddle.net/bjuepb9j/6/

Has show create table and show columns of survey_invitations

Upvotes: 1

Views: 1164

Answers (2)

Tom Shir
Tom Shir

Reputation: 482

I would recommend replacing the join with an IN subquery, and not an EXISTS subquery. When writing the query with an IN subquery, you avoid the correlated EXISTS query, which can occasionally be slower (depending on the amount of results). Try this:

SELECT
        r.id 
    FROM
        restaurants r 
    WHERE
        r.country_id = 53 
        AND r.area_id IN (
            1, 16, 27, 118, 219, 221, 222, 223, 224, 225, 230, 231, 235, 236, 237, 238, 239, 240, 248, 226, 241, 244, 246, 227, 245, 228, 229, 242, 243, 249
        ) 
        AND r.id IN (
            (
                SELECT
                    si.restaurant_id 
                FROM
                    survey_invitations si 
                WHERE
                    1 = 1 
                    AND si.status IN (
                        'approved', 'completed', 'hidden_review'
                    )
            )
        )

For this query, add these indexes:

ALTER TABLE `restaurants` ADD INDEX `restaurants_index_1` (`country_id`, `area_id`, `id`); 
ALTER TABLE `survey_invitations` ADD INDEX `survey_invitations_index_1` (`restaurant_id`, `status`);

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269973

Use exists:

SELECT r.id
FROM restaurants r
WHERE r.country_id = 53 AND
      r.area_id IN (1, 16, 27, 118, 219, 221, 222, 223,
                    224, 225, 230, 231, 235, 236, 237, 238,
                    239, 240, 248, 226, 241, 244, 246, 227,
                    245, 228, 229, 242, 243, 249
                   ) AND
      EXISTS (SELECT 1
              FROM survey_invitations si
              WHERE si.restaurant_id = r.id AND
                    si.status IN ('approved', 'completed', 'hidden_review') 
             );

Then, for this query you want indexes on restaurants(country_id, area_id, id) and survey_invitations(restaurant_id, status).

A right join is entirely unneeded for your query. The where clause turn it into an inner join anyway. In all likelihood, the expense of the query is in the group by. This version eliminates that.

Upvotes: 1

Related Questions