user1324762
user1324762

Reputation: 805

MySQL joins of small tables with huge tables in complex query - how consuming they are?

Let's suppose that database is big. I have a very complex query for search results page. In the below query, you can see that I retrieve from user_profile table some attribute value ids, the education is one attribute for example. When I have value id for attribute education, I retrieve label name for this id from array (php code) where id is array key.

  public static $education        = array(0 => 'No answer', 
                                          1 => 'High school',
                                          2 => 'Some college',
                                          3 => 'In college',
                                          4 => 'College graduate',
                                          5 => 'Grad / professional school',                                    
                                          6 => 'Post grad');     

Similar is with about 10 other attributes. Otherwise my query would be even more complex, I would need to create table attribute_id_label and make for each attribute another join to retrieve label name for value id of each attribute. This means extra 10 joins which could slow query. But still this would be the correct way.

So my question is: If table attribute_id_label have only about 500 records. Will 10 joins with this table make any big difference since the table is very small? Even if the table user_profile is very big and the query is already quite complex as you see?

And here is my query:

    SELECT 
    group_concat(DISTINCT looking.looking_for SEPARATOR ',') as lookingFor, 
    group_concat(DISTINCT photo.photo ORDER BY photo.photo_id DESC SEPARATOR ',') as photos, 
    profile.user_id as userId, 
    url as profileUrl, 
    nickname, 
    avatar.photo, 
    city, 
    ethnicity, 
    education, 
    occupation, 
    income, 
    //and 10 more fields like education, occupation, ethnicity...
    FROM user_profile profile 
    LEFT JOIN user_profile_photo photo ON photo.user_id=profile.user_id 
    LEFT JOIN user_profile_photo avatar ON avatar.photo_id=profile.photo_id 
    INNER JOIN user_profile_looking_for looking ON looking.user_id=profile.user_id 
    LEFT JOIN user_profile_txt txt ON txt.user_id = profile.user_id 
    INNER JOIN place a ON a.place_id=profile.place_id 
    INNER JOIN (SELECT lat, lon FROM place WHERE place_id = :place_id) b ON (3959 * acos( cos( radians(b.lat) ) * cos( radians( a.lat ) ) * cos( radians( a.lon ) - radians(b.lon) ) + sin( radians(b.lat) ) * sin( radians( a.lat ) ) ) ) < :within 
    GROUP BY profile.user_id LIMIT 0,12 

Most attributes won't be filled by user and since you advise non-NULLable, what would be the best to use for those unfilled attributes? I can use for each attribute extra field No answer. Each attribute would have extra value No answer. Let's give attributes education and want for example. Attribute education have id 1, want is 2.

eav_attribute_option 
option_id | attr_id | label 
1 | 1 | No answer 
2 | 1 | High school 
3 | 1 | ...  
4 | 2 | No answer 
5 | 2 | Opportunities 
6 | 2 | ... 

But now the problem is repeated No answer value for each attribute. But this is the way to avoid NULL values. I am not sure if this is correct.

Upvotes: 0

Views: 1438

Answers (3)

O. Jones
O. Jones

Reputation: 108706

I have done a lot of this kind of codelist work. It typically helps performance more than it hurts. @alxklx pointed out the truth: that you must make sure your codelist tables (e.g. education) are well formed. That is,

  • the education_id column must be the unique primary key in the codelist table.
  • the education_id column should be a simple primitive data type. That is, make it an int instead of a decimal or varchar.
  • when education_id shows up in your data tables, it must be the same datatype that you use in the codelist table, and it must be non-NULLable. In other words, don't use NULL in your data table to indicate missing data.

If you do these things your JOINs can look this simple

  FROM people p
  JOIN education e ON p.education_id = e.education_id

and the RDBMS's optimizer knows they're straightforward 1:1 joins.

All that being said, any complex query needs to be examined both for functionality and performance before you put it into a live system.

If you have missing data in your people use an education_id (or some other attribute_id) of either zero or one. Put a row in each codelist table with id zero or one and a value of "unknown" or "user didn't tell us" or whatever makes sense. (You can choose either zero or one based on convenience for your application. I prefer zero, but that's just personal preference.)

Upvotes: 1

Neville Kuyt
Neville Kuyt

Reputation: 29629

In general - very, very general - joining on a foreign key relationship - i.e. where the attribute_id is indeed a primary key, with corresponding index, with an index-friendly data type like an INT, you can treat the join as effectively free from a performance point of view.

Best way to find out is to try it and ask EXPLAIN to tell you what's going on.

Upvotes: 0

alxkls
alxkls

Reputation: 151

Two very major things you need to consider-first is how big are the tables and second-indexes. If an index is missing on a large table or the data type of the field is different from the data type of the field of the table you join it to-it might as well take days or even months. Personally I've done far far bigger selects with enormous tables and the results were pretty good, coming at about 2 seconds. Use explain select to see how the query is standing and if something is not ok-describe your tables, show their indexes and compare. It's really hard to give you a definitive answer if we don't know your database design...

Upvotes: 0

Related Questions