Cannon Moyer
Cannon Moyer

Reputation: 3174

Difference between queries generated by joins and includes

I have the following query that loads items based on the document ordered date.

 items = ::Item.joins(:document).where(:documents => {:order_created_at => @start_date..@end_date, type: "Order"}).order(name: :asc)

Here is the generated SQL:

SELECT  `items`.*
    FROM  `items`
    INNER JOIN  `documents`  ON `documents`.`id` = `items`.`document_id`
    WHERE  `documents`.`order_created_at`
           BETWEEN '2020-06-01 05:00:00'
               AND '2020-07-08 04:59:59'
      AND  `documents`.`type` = 'Order'
    ORDER BY  `items`.`name` ASC

This does a basic inner join so it's quite obvious that I get ONLY items that match the selected documents. I want to optimize this query since I reference values in the Document model such as item.document.created_at. In order to do this I replaced joins with includes in the query above here is the new query:

items = ::Item.includes(:document).where(:documents => {:order_created_at => @start_date..@end_date, type: "Order"}).order(name: :asc)

This runs fine but I don't know if the query is logically the same. If you'll notice, there is now a left outer joins on the documents table with the items table. I don't entirely understand how to interpret that. I'm trying to query the exact same items as I did with the joins except with the includes method for optimization reasons. Here is the resulting query:

SELECT "items"."id" AS t0_r0, "items"."name" AS t0_r1, "items"."description" AS t0_r2, 
"items"."shipping" AS t0_r3, "items"."sku" AS t0_r4, "items"."eta" AS t0_
r5, "items"."warranty" AS t0_r6, "items"."part_number" AS t0_r7, "items"."status" AS t0_r8, 
"items"."url_key" AS t0_r9, "items"."allow_rma" AS t0_r10, "items"."price" AS t0_r11,
"items"."meta_title" AS t0_r12, "items"."meta_keywords" AS t0_r13, 
"items"."meta_description" AS t0_r14, "items"."image" AS t0_r15, "items"."thumbnail" AS 
t0_r16, "items"."popularity" AS t0_r17, "items"."purchased_count" AS t0_r18, 
"items"."quantity" AS t0_r19, "items"."weight" AS t0_r20, "items"."created_at" AS t0_r21, 
"items"."updated_at" AS t0_r22,
"items"."document_id" AS t0_r23, "items"."product_id" AS t0_r24, "items"."total" AS t0_r25, 
"items"."warehouse_name" AS t0_r26, "items"."quantity_available" AS t0_r27, "items".
"quantity_to_return" AS t0_r28, "items"."reason_to_return" AS t0_r29, 
"items"."item_condition" AS t0_r30, "items"."resolution" AS t0_r31, "items"."brand" AS 
t0_r32, "items"."category" AS t0_r33, "items"."mod_name" AS t0_r34, "items"."product_type" 
AS t0_r35, "items"."search_variations" AS t0_r36, "items"."model_number" AS t0_r37, 
"items"."quantity_returned" AS t0_r38, "items"."line_number" AS t0_r39, 
"items"."external_id" AS t0_r40, "documents"."id" AS t1_r0, "documents"."status" AS t1_r1, 
"documents"."subtotal" AS t1_r2, "documents"."shipping" AS t1_r3, "documents"."handling" AS 
t1_r4, "documents"."grand_total" AS t1_r5, "documents"."total_paid" AS t1_r6, 
"documents"."total_refunded" AS t1_r7, "documents"."total_due" AS t1_r8, 
"documents"."order_prefix" AS t1_r9, "documents"."order_postfix" AS t1_r10, 
"documents"."shipping_method" AS t1_r11, "documents"."status_code" AS t1_r12, 
"documents"."created_at" AS t1_r13, "documents"."updated_at" AS t1_r14, 
"documents"."user_id" AS t1_r15, "documents"."downloaded_by_shipworks" AS t1_r16, 
"documents"."transaction_id" AS t1_r17, "documents"."settled" AS t1_r18, 
"documents"."voided" AS t1_r19, "documents"."voided_date" AS t1_r20, 
"documents"."voided_transaction_id" AS t1_r21, "documents"."tax" AS t1_r22, 
"documents"."payment_gateway" AS t1_r23, "documents"."admin_id" AS t1_r24, 
"documents"."purchase_order_number" AS t1_r25, "documents"."loaded_success_page" AS t1_r26, 
"documents"."discount_amount" AS t1_r27, "documents"."type" AS t1_r28, "documents"."token" 
AS t1_r29, "documents"."shipping_method_id" AS t1_r30, "documents"."document_id" AS t1_r31, 
"documents"."pay_without_credit_card" AS t1_r32, "documents"."order_created_at" AS t1_r33, 
"documents"."discount_id" AS t1_r34, "documents"."submitted_from_admin" AS t1_r35, 
"documents"."order_number" AS t1_r36, "documents"."marketplace" AS t1_r37, 
"documents"."amazon_prime" AS t1_r38, "documents"."discount_code" AS t1_r39, 
"documents"."discount_applied" AS t1_r40, "documents"."fulfillment_channel" AS t1_r41, 
"documents"."submitted_from_cron" AS t1_r42, "documents"."shipping_discount" AS t1_r43, 
"documents"."part_discount" AS t1_r44, "documents"."last_downloaded_by_fulfillment" AS 
t1_r45 FROM "items" LEFT OUTER JOIN "documents" ON "documents"."id" = "items"."document_id" 
WHERE "documents"."order_created_at" BETWEEN '2020-06-01 05:00:00' AND '2020-07-08 04:59:59' 
AND "documents"."type" = 'Order' ORDER BY "items"."name" ASC

Upvotes: 0

Views: 75

Answers (3)

Rick James
Rick James

Reputation: 142528

documents:  INDEX(type, order_created_at)  -- in that order
items:  INDEX(document_id)

The spurious "LEFT" may be sloppiness by the package that generated the query. It is confusing to the reader, but harmless to the execution. MySQL's Optimizer will see that documents is not optional (due to the WHERE clauses) and ignore the LEFT.

Upvotes: 0

spickermann
spickermann

Reputation: 107107

It is important to note that includes and joins have two different use-cases. They might sometimes generate similar queries but in other cases, they will generate totally different queries.

joins on the one hand generate an INNER JOIN database query. Which can be used to query the records with conditions on the joined table. joins does not eager load associated records.

includes use-case, on the other hand, is to eager load associated records and avoid N+1 queries. Rails might use a database join to load the records and their associated records in one go. Or it might fire two queries to fetch all required records. There is no guarantee that it will always do a database JOIN.

That said: When you care about eager loading associated records the use includes. When your query depends on a database INNER JOIN then use joins. They are not interchangeable although it might look like in some cases.

Upvotes: 3

MatBailie
MatBailie

Reputation: 86798

Because you also filter on the document's fields, the effect of the LEFT OUTER JOIN is redundant.

The reason it's there is because includes is intended to not cause an item to be excluded even if it doesn't have a document.

If there is an item that doesn't have a document, however, the where will cause it to be excluded.

  • Because the order_created_at will be NULL
  • And NULL is not between the start and end date

Thus, you effectively have an inner join again.

Upvotes: 1

Related Questions