Netanel Vaknin
Netanel Vaknin

Reputation: 1

Performance, Why JOIN is faster than IN

I tried to optimize some PHP code that performs a lot of queries on different tables (that include data).

The logic was to take some fields from each table by neighborhood id(s) depends whether it was city(a lot of neighborhoods ids) or specific neighborhood.

For example, assume that I have 10 tables of this format:

neighborhood_id     |     some_data_field

The queries were something like that:

SELECT `some_data_field` 
FROM `table_name` AS `data_table` 
LEFT JOIN `neighborhoods_table` AS `neighborhoods` ON  `data_table`.`neighborhood_id' = `neighborhoods`.`neighborhood_id` 
WHERE `neighborhood`.`city_code` = SOME_ID

Because there were like 10 queries like that I tried to optimize the code by removing the join from 10 queries and perform one query to neighborhoods table to get all neighborhood codes.

Then in each query I did WHERE IN on the neighborhoods ids.

The expected result was a better performance, but it turns out that it wasn't better.

When I perform a request to my server the first query takes 20ms the second takes more and the third takes more and so on. (the second and the third take something like 200ms) but with JOIN the first query takes 40ms but the rest of the queries take 20ms-30ms.

The first query in request shows us that where in is faster but I assume that MYSQL has some cache when dealing with JOINs.

So I wanted to know how can I improove my where in queries?

EDIT I read the answer and comments and I understood I didn't explain well why I have 10 tables because each table categorized by property.

For example, one table contains values by floor and one by rooms and one by date so it isn't possible to union all tables to one table.

Second Edit I'm still misunderstood.
I don't have only one data column per table, every table has it's own amount of fields, it can be 5 fields for one table and 3 for another. and different data types or formatting types, it can be date or money present ation ,additionally, I perform in my queries some calculations about those fields, some times it can be AVG or weighted average and in some tables it's only pure select. Additionally I perform group by by some fields in one table it can be by rooms and in other it can be by floor

Upvotes: 0

Views: 99

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269963

For example, assume that I have 10 tables of this format:

This is the basis of your problem. Don't store the same information in multiple tables. Store the results in a single table and let MySQL optimize the query.

If the original table had "information" -- say the month the data was generated -- then you may need to include this as an additional column.

Once the data is in a single table, you can use indexes and partitioning to speed the queries.

Note that storing the data in a single table may require changes to your ingestion processes -- namely, inserting the data rather than creating a new table. But your queries will be simpler and you can optimize the database.

As for which is faster, an IN or a JOIN. Both are doing similar things under the hood. In some circumstances, one or the other is faster, but both should make use of indexes and partitions if they are available.

Upvotes: 2

Related Questions