Dan Andreasson
Dan Andreasson

Reputation: 16222

How do I select average of aggregated subquery in Rails

I have the following data and SQL query that return the expected result. I'm now trying to achieve the same thing in Rails with Groupdate (and ActiveMedian).

user_id points created_at
1 5 2020-10-01
1 15 2020-11-01
1 20 2020-11-02
2 33 2020-11-01
SELECT
  AVG(points) AS points,
  DATE_TRUNC('MONTH', created_at) AS period
FROM
  (
    SELECT
      SUM(points) AS points,
      DATE_TRUNC('MONTH', created_at) AS created_at
    FROM
      user_points
    WHERE
      created_at IS NOT NULL
    GROUP BY
      user_id,
      DATE_TRUNC('MONTH', created_at)
  ) points_per_user_per_period
GROUP BY
  DATE_TRUNC('MONTH', created_at)

I expect

points period
5 2020-10-01
34 2020-11-01

Result

UserPoints.from(
  UserPoints.group(:user_id).group_by_month(:created_at).sum(:points)
).group_by_month(:created_at).average(:points)

but since .sum(:points) executes immediately it doesn't work. Could I formulate the query in another way or make sum not execute immediately? Any other ideas?

To summarize what the outcome should be; The average amount of points users get in a period.

Upvotes: 1

Views: 735

Answers (2)

inopinatus
inopinatus

Reputation: 3780

Solution #2, that is not a solution

This additional answer also uses Arel, and the Groupdate gem, but it's actually more dangerous than my recommended approach. I'm including it as a separate answer because a) it works, and b) it looks so elegant:

class UserPoints < ApplicationRecord
  def self.averages_by_month
    # average of points = total points / # of distinct users
    points_avg = arel_table[:points].sum / arel_table[:user_id].count(true)

    # execute, grouped by month
    group_by_month(:created_at).calculate(:itself, points_avg)
  end
end

and this gives the correct result! At least, it does at the time of writing with Rails 6.

Unfortunately, there is deception afoot; this approach relies on substantially more knowledge of Active Record internals than simply the Arel API.

Discussion, or Why This Is Bad

The #calculate method's documented parameters are:

relation.calculate(operation, column_name)

and although support for using Arel expressions in aggregate calculations was added intentionally, it is not documented in the public API. Alone, that mightn't be so terrible, but this method then relies on an implementation detail: Rails constructs the full aggregate expression by using the operation parameter as a method name when calling the Arel representation of the column parameter, and thereby obtaining the aggregation expression in return. By passing :itself I'm hijacking this internal communication between framework layers, and causing points_avg to return itself during the internal to-and-fro via Kernel#itself.

This is a metaprogramming trick, and like all stunts it is fun to demonstrate, but should not be part of anyone's production code, at least not unless the #calculate method is someday documented to accept a bare Arel expression, since we're otherwise depending on very deep knowledge of Rails internals, i.e. it's a maintenance no-no.

There are some more related assumptions hidden in there besides, about other elements in the guts of grouped aggregate expression evaluation, such as expecting the column aliasing to just handle whatever it's given with good grace. That works too, so some might say this is also a demonstration of how robust and versatile Rails can be, but it's definitely testing the boundary of reasonable assumptions.

Recommendation

Overall, despite the apparent brevity and elegance of this solution, it is obscuring rather more sorcery than I'm prepared to recommend for production use. Instead I present and explain it as an amusing novelty.

Who knows, someday this may even be explicitly supported.

Upvotes: 1

inopinatus
inopinatus

Reputation: 3780

This is possible if we drop down one stratum of Rails, and use the relational algebra (Arel) that underpins Active Record.

In this approach, we'll teach Arel about the date_trunc function ourselves*, then build a nested aggregate query for the inner summation that doesn't execute immediately, but is instead incorporated into the outer averaging aggregate:

class UserPoints
  def self.averages
    period = Arel::Nodes::NamedFunction.new('date_trunc', [
      Arel::Nodes::Quoted.new('month'),
      arel_table[:created_at]
    ])
    points = arel_table[:points].sum
    
    # The per-user aggregate sum subquery, as an abstract relational structure
    subquery = select(points.as("points"), period.as("period")).group(:user_id, :period)

    # Execute
    from(subquery, quoted_table_name).group(:period).average(:points)
  end
end

This approach is versatile. It is adaptable to composition with scoped relations; for example, if you wanted to write UserPoints.where(created_at: Time.current.all_year).averages, then insert an appropriate unscope in the last line, becoming:

from(subquery, quoted_table_name).unscope(:where).group(:period).average(:points)

Similarly, to use in conjunction with the Groupdate library, at least for the outer query*, try:

from(subquery, quoted_table_name).group_by_month(:period).average(:points)

There may even be opportunity to refactor this into a scope declaration, by omitting the final aggregate expressions, thus gaining the flexibility to use others.

Now for the caveat: Arel is a Rails internal API, which means if you want documentation you'll need to read its source code, and there may be breaking changes even during a minor release. That's actually quite rare, and it's fine to use Arel (many do) if your code is wearing the proper safety gear, which is of course an appropriate test case.


* I did not use the Groupdate gem for the inner aggregate query, because it lacks the means to name the result columns.

Upvotes: 1

Related Questions