user3576036
user3576036

Reputation: 1435

Ruby eager query causing load issues

To explain the problem I am facing, let me take an example from the Ruby Sequel documentation.

In the case of

Album.eager(:artists, :genre).all

This will be fine if the data is comparatively less.

If the data is huge this query will fire artist_id in thousands. If data is in millions that would be a million artist_ids being collected. Is there way to fetch the same output but with different optimized query?

Upvotes: 0

Views: 43

Answers (1)

Konstantin Strukov
Konstantin Strukov

Reputation: 3019

Just don't eager load large datasets, maybe? :)

(TL;DR: this is not an answer but rather a wordy way to say "there is no single simple answer" :))

Simple solutions for simple problems. Default (naive) eager loading strategy efficiently solves the N+1 problem for reasonably small relations, but if you try to eager load huge relations you might (and will) shoot own leg.

For example, if you fetch just 1000 albums with ids, say, (1...1000), then your ORM will fire additional eager loading queries for artists and genres that might look like select * from <whatever> where album_id in (1,2,3,...,1000) with 1000 ids in the in list. And this is already a problem on its own - performance of where ... in queries can be suboptimal even in modern dbs with their query planners as smart as Einstein. At certain data scale this will become awfully slow even with such a small batch of data. And if you try to eager load just everything (as in your example) - it's not feasible for almost any real-world data usage (except the smallest use cases).

So,

  1. In general, it is better to avoid loading "all" and load data in batches instead;
  2. EXPLAIN is your best friend - always analyze queries that your ORM fires. Even production grade battle-tested ORM can (and will) produce sub-optimal queries time to time;
  3. The latter is especially true for large datasets - at certain scale you will have no other choices but to move from nice ORM API to lower level custom-tailored SQL queries (at least for bottlenecks);
  4. At certain scale even custom SQL will not help any more - the problems of that scale need to be addressed on another level (data remodeling, caching, sharding, CQRS etc etc etc ...)

Upvotes: 1

Related Questions