Reputation: 1237
I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord doesn't support CTEs.
I have a table called user_activity_transitions
which contains a series of records of a user activity being started and stopped (each row refers to a change of state: e.g started or stopped).
One user_activity_id
might have a lot of couples started-stopped, which are in 2 different rows.
It's also possible that there is only "started" if the activity is currently going on and hasn't been stopped. The sort_key
starts at 0 with the first ever state and increments by 10 for each state change.
id to_state sort_key user_activity_id created_at
1 started 0 18 2014-11-15 16:56:00
2 stopped 10 18 2014-11-15 16:57:00
3 started 20 18 2014-11-15 16:58:00
4 stopped 30 18 2014-11-15 16:59:00
5 started 40 18 2014-11-15 17:00:00
What I want is the following output, grouping couples of started-stopped together to be able to calculate duration etc.
user_activity_id started_created_at stopped_created_at
18 2014-11-15 16:56:00 2014-11-15 16:57:00
18 2014-11-15 16:58:00 2014-11-15 16:59:00
18 2014-11-15 17:00:00 null
The way the table is implemented makes it much harder to run that query but much more flexible for future changes (e.g new intermediary states), so that's not going to be revised.
My Postgres query (and the associated code in Rails):
query = <<-SQL
with started as (
select
id,
sort_key,
user_activity_id,
created_at as started_created_at
from
user_activity_transitions
where
sort_key % 4 = 0
), stopped as (
select
id,
sort_key-10 as sort_key2,
user_activity_id,
created_at as stopped_created_at
from
user_activity_transitions
where
sort_key % 4 = 2
)
select
started.user_activity_id AS user_activity_id,
started.started_created_at AS started_created_at,
stopped.stopped_created_at AS stopped_created_at
FROM
started
left join stopped on stopped.sort_key2 = started.sort_key
and stopped.user_activity_id = started.user_activity_id
SQL
results = ActiveRecord::Base.connection.execute(query)
What it does is "trick" SQL into joining 2 consecutive rows based on a modulus check on the sort key.
The query works fine. But using this raw AR call annoys me, especially since what connection.execute
returns is quite messy. I basically need to loop through the results and put it in the right hash.
2 questions:
Bear in mind that I'm quite new to Rails and not a query expert so there might be an obvious improvement...
Thanks a lot!
Upvotes: 8
Views: 5688
Reputation: 15288
Now it's possible to use with
method. AR query looks like this:
results =
UserActivityTransition
.with(started: UserActivityTransition.select('id, sort_key, user_activity_id, created_at AS started_created_at').where('sort_key % 4 = 0'))
.with(stopped: UserActivityTransition.select('id, sort_key-10 as sort_key2, user_activity_id, created_at AS stopped_created_at').where('sort_key % 4 = 2'))
.select('started.user_activity_id AS user_activity_id, started.started_created_at AS started_created_at, stopped.stopped_created_at AS stopped_created_at')
.from('started')
.joins('LEFT JOIN stopped ON stopped.sort_key2 = started.sort_key AND stopped.user_activity_id = started.user_activity_id')
Let's verify that SQL is correct
puts results.to_sql
Output:
WITH "started" AS (SELECT id, sort_key, user_activity_id, created_at AS started_created_at FROM "user_activity_transitions" WHERE (sort_key % 4 = 0)), "stopped" AS (SELECT id, sort_key-10 as sort_key2, user_activity_id, created_at AS stopped_created_at FROM "user_activity_transitions" WHERE (sort_key % 4 = 2)) SELECT started.user_activity_id AS user_activity_id, started.started_created_at AS started_created_at, stopped.stopped_created_at AS stopped_created_at FROM started LEFT JOIN stopped ON stopped.sort_key2 = started.sort_key AND stopped.user_activity_id = started.user_activity_id
Now it's possible somehow to process data
results.each do |r|
puts "#{r.user_activity_id} | #{r.started_created_at} | #{r.stopped_created_at}"
end
Output
18 | 2014-11-15 16:56:00 | 2014-11-15 16:57:00
18 | 2014-11-15 16:58:00 | 2014-11-15 16:59:00
18 | 2014-11-15 17:00:00 |
This query from the question is large enough
More readable example of usage:
Book
.with(books_with_reviews: Book.where("reviews_count > ?", 0))
.with(books_with_ratings: Book.where("ratings_count > ?", 0))
.joins("JOIN books_with_reviews ON books_with_reviews.id = books.id")
# WITH books_with_reviews AS (
# SELECT * FROM books WHERE (reviews_count > 0)
# ), books_with_ratings AS (
# SELECT * FROM books WHERE (ratings_count > 0)
# )
# SELECT * FROM books JOIN books_with_reviews ON books_with_reviews.id = books.id
Upvotes: 0
Reputation: 164994
While Rails does not directly support CTEs, you can emulate a single CTE and still take advantage of ActiveRecord. Instead of a CTE, use a from
subquery.
Thing
.from(
# Using a subquery in place of a single CTE
Thing
.select(
'*',
%{row_number() over(
partition by
this, that
order by
created_at desc
) as rank
}
)
:things
)
.where(rank: 1)
This is not exactly the same as, but equivalent to...
with ranked_things as (
select
*,
row_number() over(
partition by
this, that
order by
created_at desc
) as rank
)
select *
from ranked_things
where rank = 1
Upvotes: 5
Reputation: 14402
I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord does support CTEs.
As far as I know ActiveRecord doesn't support CTE. Arel, which is used by AR under the hood, supports them, but they're not exposed to AR's interface.
Is there a way to get rid of the CTE and run the same query using Rails magic?
Not really. You could write it in AR's APIs but you'd just write the same SQL split into a few method calls.
If not, is there a better way to get the results I want in a nice-looking hash?
I tried to run the query and I'm getting the following which seems nice enough to me. Are you getting a different result?
[
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:56:00", "stopped_created_at"=>"2014-11-15 16:57:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:58:00", "stopped_created_at"=>"2014-11-15 16:59:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 17:00:00", "stopped_created_at"=>nil}
]
I assume you have a model called UserActivityTransition
you use for manipulating the data. You can use the model to get the results as well.
results = UserActivityTransition.find_by_sql(query)
results.size # => 3
results.first.started_created_at # => 2014-11-15 16:56:00 UTC
Note that these "virtual" attributes will not be visible when inspecting the result but they're there.
Upvotes: 3