Reputation: 2142
I have a model that is sorted in a particular order. My goal is to find a record from the model where the sum of a particular column of all previous records equals a certain number. The following example gets me what I need, but it is very slow, especially with a rather large table. Are there any faster ways to solve for the product.id where the sum of all previous products' points = 100000?
total_points = 0
find_point_level = 100000
@products = Product.order("id").all
@products.each do |product|
total_points = product.points + total_points
@find_product = product.id
break if total_points >= find_point_level
end
Update
Here are some times for a few of the solutions below. This is going through about 60,000 records. Times are for ActiveRecord.
Original example (above):
2685.0ms
1238.8ms
1428.0ms
Original example using find_each:
799.6ms
799.4ms
797.8ms
Creating a new column with the sums:
181.3ms
170.7ms
172.2ms
Upvotes: 5
Views: 2189
Reputation: 54684
It turns out, there actually is a way to do this in SQL. First, let's set up some testing environment:
rails new foobar
cd foobar
rails g model Product name:string points:integer
rake db:migrate
rails console
In the Rails console, feed the DB with some records:
Product.new(name: 'Foo', points: 1).save!
Product.new(name: 'Bar', points: 2).save!
Product.new(name: 'Baz', points: 3).save!
Product.new(name: 'Baf', points: 4).save!
Product.new(name: 'Quux', points: 5).save!
Now i found a way of getting running totals in SQL in this post here. It works like this:
query = <<-SQL
SELECT *, (
SELECT SUM(points)
FROM products
WHERE id <= p.id
) AS total_points
FROM products p
SQL
Running this query against the test DB gives us:
Product.find_by_sql(query).each do |p|
puts p.name.ljust(5) + p.points.to_s.rjust(2) + p.total_points.to_s.rjust(3)
end
# Foo 1 1
# Bar 2 3
# Baz 3 6
# Baf 4 10
# Quux 5 15
So we can now use a HAVING
clause (and a GROUP BY
because this is needed for HAVING
)to fetch only the products that match the condition and LIMIT
the number of results to one:
query = <<-SQL
SELECT *, (
SELECT SUM(points)
FROM products
WHERE id <= p.id
) AS total_points
FROM products p
GROUP BY p.id
HAVING total_points >= #{find_point_level}
LIMIT 1
SQL
I'm really curious how this performs in your environment with many many records. Give it a try and tell me if it works for you, if you like.
Upvotes: 1
Reputation: 12564
find_each
instead of each
to load products in batches instead of loading all the table. see the guidesEDIT ignore the following, i forgot that window functions do not permit WHERE and HAVING clauses
if you are willing to use a non db-agnostic solution, you can use this (not tested):
query = <<-SQL
SELECT id, SUM(points) OVER (ORDER BY id) AS total_points
FROM products
HAVING total_points >= 100000
LIMIT 1
SQL
@product = Product.find_all_by_sql( query )
this uses window functions that are NOT supported by all RDBMS (Postgresql does). Beware, once you have retrieved the @product
, it will be a readonly record with only two attributes accessible: id
and total_points
Upvotes: 0
Reputation: 868
If table is very big, you can use plain sql query:
find_point_level = 100000
Product.find_all_by_sql("SELECT SUM(points) FROM (SELECT points FROM products ORDER BY id LIMIT #{find_point_level}) AS subquery")
Also for column index should be exist index in the database.
Upvotes: -2
Reputation: 5081
You can try denormalizing your database, and keeping partial sums directly in products
table. Simple query with where
and limit
would return you a proper answer in no time.
You need to create additional filter, that will update single record whenever product is added, and all products whenever product is deleted or it's points
field were changed.
Upvotes: 6