miler350
miler350

Reputation: 1421

Large Results Ruby on Rails

I am testing performance of a site by seeding ~10,000 records.

I want to render them all to one output page.

So let's say I have a User model (10,000 users), and I want to render them all to one page (let's say the hypothetical is that I want them all pre-loaded into a search box, so I need all of the records available at once, I know there are solutions for this exact example, but continue...).

A few questions:

  1. How does batch finding work and is there some way I can find these records in smaller increments (in the background) at the controller/model level and not just by adding auto-scrolling or pagination to the view?

  2. I don't need all of the data from the user so I can also use .pluck. So for example, in theory I may have be able to do User.pluck(:date_or_birth), but what if I had an age method for a User, where user.age would return the age integer based off the .date_of_birth attribute. Is there a method as lightweight as pluck that allows us to grab methods (ie. .pluck_method(:age)). The issue I have here is that some of what I want to display is not an attribute, but may I pluck attributes and do any attribute manipulation as a helper?

  3. If I use pluck, my collection is now an array instead of an ActiveRecord hash, is it best to convert to a hash or just access the array by order?

Just trying to make a few different sites as fast as possible and generally the code is fast but I need it needs to be manipulated with scale.

Upvotes: 1

Views: 952

Answers (1)

ajjahn
ajjahn

Reputation: 192

Rendering all records into a single page will inevitably cease to be a viable solution as your dataset grows. You'll eventually hit boundaries of limited server side resources/processing time as well as run into limitations with web clients. Of course, you'll need to do your own profiling and benchmarking to determine where that boundary is. Until then, here are things that answer your question:

  1. (a) Rails provides the .find_each method on ActiveRecord::Relation. It will batch large queries into smaller chucks (1000 records by default) to avoid over working the database and loading your entire dataset into memory at once. Your query might look like: User.find_each(batch_size: 2000) Read more on find_each here. Also have a look at find_in_batches on the same page.

    (b) find_each will only solve half the problem because Rails renders the entire page view and template in memory before sending it back down to the client. ActionController::Streaming offers enhancements incrementally send bits of the page back to the client as they are rendered.

    Streaming inverts the rendering flow by rendering the layout first and streaming each part of the layout as they are processed. This allows the header of the HTML (which is usually in the layout) to be streamed back to client very quickly, allowing JavaScripts and stylesheets to be loaded earlier than usual.

    This approach was introduced in Rails 3.1 and is still improving. Several Rack middlewares may not work and you need to be careful when streaming. Those points are going to be addressed soon.

    In order to use streaming, you will need to use a Ruby version that supports fibers (fibers are supported since version 1.9.2 of the main Ruby implementation).

    (c) I'd highly recommend you also take a look at some various caching techniques to avoid re-rendering views or even hitting the database at all. The rails caching guides provide a great overview of different techniques that might work for your project.

  2. You can use the select method instead of pluck to limit the data retrieved from the database to one or more database columns that you are interested in. Select will return a set of ActiveRecord just like a typical query except only the specified attributes will be populated in memory. With a select(:date_of_birth) you could use your user.age method.

  3. select might be a better solution for you here as well. One thing to keep in mind is instantiating AR objects for each record will increase memory required and use extra CPU cycles compared to the simple array that you get using pluck.

Upvotes: 2

Related Questions