l3fty
l3fty

Reputation: 582

Increase performance on large dataset with pivot table relationships (Using Laravel)

Looking for advice on whether I can improve my database statements or if I should start caching query results to increase performance.

The schema is set up as a Many-To-Many Polymorphic relationship. I have a Videos table that contains video information, a Category table that contains all the categories and, a Categorizable table that contains the pivot information.

The ratio between Videos and Categorizable is around 1:4. (I.e. for every video there's at least 4+ categories).

The results when accessing pivot data with a 40 row limit and WITHOUT offset is: ~1.2s+. Adding offset would increase this even more when offset > 50,000 rows.

Although 1.2 seconds seems small, this is only a small portion of the whole data set which ultimately contains around 30 million video records (thus having ~12+ million categorizable records). I fear 1.2s will multiply with every million records.

Database schema

Videos table:

------------------------------------------------------------------------
id     | title                      | author    | views | duration | etc.
------------------------------------------------------------------------
1      | What's the biggest word?   | Dictonary | 3432  | 600      | ...
2      | Yearly Videos Roundup 2020 | YouTube   | 165   | 945      | ...
3      | Google SEO Help            | Google    | 1401  | 287      | ...
↓      
101234 | How to cook pasta          | YouTube   | 9401  | 87       | ...

Indexes:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table  | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
videos | 0          | PRIMARY          | 1            | id          | A         | 253057      | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL
videos | 1          | idx_videos_views | 1            | views       | A         | 102188      | NULL     | NULL   |YES   | BTREE      |         |               | YES     | NULL

Categorizable table:

-------------------------------------------------------------
id      | category_id | cateogrizable_id | categorizable_type
-------------------------------------------------------------
1       | 5           |  1               | 'Video'
2       | 100         |  2               | 'Video'
3       | 31          |  3               | 'Video'
↓
299052  | 65          |  101234          | 'Video'

Indexes:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table          | Non_unique | Key_name             | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
categorizables | 0          | PRIMARY              | 1            | id               | A         | 296745      | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL
categorizables | 1          | idx_category_id      | 1            | category_id      | A         | 82          | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL
categorizables | 1          | idx_categorizable_id | 1            | categorizable_id | A         | 104705      | NULL     | NULL   |      | BTREE      |         |               | YES     | NULL

Categories table:

--------------------
id  | name 
--------------------
1   | Education
2   | Health
3   | Entertainment
↓
100 | News

Indexes:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table       | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
categories  |  0         | PRIMARY  |  1           |  id         |  A        |  100        | NULL     | NULL   |      |  BTREE     |         |               |  YES    | NULL

MySQL

Type: InnoDB

Laravel Query:

Category::where('id', $cat)
  ->with(['videos' => function($query){ 
    return $query->take(40)->orderby('views'); 
   }])
   ->get();

Turns into MySQL query:

SELECT `videos`.`id`, `views` 
FROM `videos` inner join `categorizables` 
ON `videos`.`id` = `categorizables`.`categorizable_id`
WHERE `categorizables`.`category_id` = 1 
ORDER BY `views` desc 
LIMIT 40 offset 0

Performance Results

The following are the performance outputs from MySQL

---------------------------------------------------------
Stage                                          | Duration
---------------------------------------------------------
stage/sql/starting                             | 0.000068
stage/sql/Executing hook on transaction begin. | 0.000000
stage/sql/starting                             | 0.000003
stage/sql/checking permissions                 | 0.000001
stage/sql/checking permissions                 | 0.000001
stage/sql/Opening tables                       | 0.000038
stage/sql/init                                 | 0.000003
stage/sql/System lock                          | 0.000005
stage/sql/optimizing                           | 0.000007
stage/sql/statistics                           | 0.005628
stage/sql/preparing                            | 0.000008
stage/sql/Creating tmp table                   | 0.000033
stage/sql/executing                            | 1.273442
stage/sql/end                                  | 0.000001
stage/sql/query end                            | 0.000001
stage/sql/waiting for handler commit           | 0.000008
stage/sql/removing tmp table                   | 0.000003
stage/sql/closing tables                       | 0.000006
stage/sql/freeing items                        | 0.000080
stage/sql/cleaning up                          | 0.000000

Specifically:

stage/sql/executing | 1.273442

Query Cost:

----------------------------------
Variable_name     | Value
----------------------------------
Last_query_cost   | 107258.575124

EDIT:

Explain Query

With sorting:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | select_type | table          | partitions | type   | possible_keys                        | key             | key_len | ref                                     | rows  | filtered | Extra
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  | SIMPLE      | categorizables | NULL       | ref    | idx_category_id,idx_categorizable_id | idx_category_id | 4       | const                                   | 51210 | 100.00   | Using temporary; Using filesort
1  | SIMPLE      | videos         | NULL       | eq_ref | PRIMARY                              | PRIMARY         | 4       | dev_db.categorizables.categorizable_id  | 1     | 100.00   | Using index

Without sorting:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | select_type | table          | partitions | type   | possible_keys                        | key             | key_len | ref   | rows  | filtered | Extra
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  | SIMPLE      | videos         | NULL       | index  | NULL                                 | PRIMARY         | 4       | NULL  | 40    | 100.00   | Backward index scan; Using index

Upvotes: 1

Views: 1333

Answers (2)

Rick James
Rick James

Reputation: 142298

Let me walk you through the worst case:

SELECT  v.`id`, v.`views`
    FROM  `videos` AS v
    inner join  `categorizables` AS c  ON v.`id` = c.`categorizable_id`
    WHERE  c.`category_id` = 1
    ORDER BY  v.`views` desc
    LIMIT  40 offset 50000 

The flow goes something like:

  1. Look in categorizables for all the rows with category_id = 1. This may or may not use an index: INDEX(category_id, categorizable_id) may help.
  2. For each of those rows, reach into videos to get views and id. Assuming that id is the PRIMARY KEY, I have no added recommendation.
  3. Gather all that stuff into a temporary table. (Presumably more than 50K rows?)
  4. Sort that table.
  5. Read through the sorted table, skipping over 50000 rows.
  6. Deliver 40 rows and quit.

I hope it is obvious that removing the sort or removing the offset or (etc), will lead to a simplified execution plan, and hence be faster.

You say there is a many-to-many relationship? Is that categorizables? Does it followed the performance tips here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table ?

Upvotes: 1

Japs
Japs

Reputation: 1052

You can look at this Laravel Debugbar to see how many duplicate queries you are doing.

Next you can eager load the relationship when you fetch the data e.g:

Index Controller

$videos = Videos::with('categories')->get();
return $videos;

You can use also Caching for Laravel e.g:

$videos = \Cache::rememberForever('key', function() {
  return Video::with('categories')->get();
});

return $videos;

Upvotes: 0

Related Questions