Reputation: 17896
I have a table like so:
call_activity (
call_id TEXT,
activity_type TEXT,
activity_time TIMESTAMP,
PRIMARY KEY(call_id, activity_type, activity_time)
)
activity_type
may be one of about 9 different strings:
'started'
'completed' (about 5 variations on this)
'other' (these are the states that I want to display)
A call has a series of events, starting with 'started' and culminating in a completed event (one of 5 or so possible ones). I need a view of this with two columns: The first column must be the activity time of a call's 'started' event, the second column must be the most recent event for that call. This view must only have calls for which there is no completed event.
I have a set of nested joins on it, but they're slow as hell. I need a reasonably optimal view of this. Can anyone help me?
Upvotes: 0
Views: 272
Reputation: 79536
This solution I tested without any indexing, and on an incredibly small data set, so it will require some tweaking for your environment. You'll want an index on at least call_id (duh!) and activity_type. It also uses a custom aggregate function, LAST() (I use a similar FIRST() function in many of my own projects).
CREATE OR REPLACE FUNCTION slast(anyelement,anyelement) RETURNS anyelement AS $$
SELECT $2
$$ LANGUAGE sql IMMUTABLE STRICT;
CREATE AGGREGATE last (
sfunc = slast,
basetype = anyelement,
stype = anyelement
);
CREATE VIEW current_calls AS
SELECT min(activity_time) AS call_started,last(activity_type) AS current_activity
FROM (
SELECT call_id,activity_time,activity_type
FROM call_activity
WHERE call_id NOT IN (SELECT call_id FROM call_activity WHERE activity_type='completed')
ORDER BY activity_time
) AS x
GROUP BY call_id;
I make no guess whether this performs better or worse than some of the other proposals offered. I prefer it because (to me) it is more readable. But readability really must take a backseat to performance in things like this.
Upvotes: 1
Reputation: 14993
A solution without joins, using the CASE statement and grouping
select call_id ,
min(case when activity_type = 'started' then activity_time
else null
end) as timestamp_started,
max(activity_time) as timestamp_other
from call_activity
group by call_id
having
sum(case when activity_type = 'completed-1' then 1
when activity_type = 'completed-2' then 1
else 0
end) = 0
Upvotes: 0
Reputation: 76952
Change datatypes for your IDs, and try something like this (add 'started' to the list of others filter if you would like to include 'started' in the latest non-completed activity as well):
SELECT ca_s.activity_time AS timestamp_started,
ca_o.activity_time AS timestamp_other
FROM call_activity ca_s
LEFT JOIN call_activity ca_o
ON ca_s.call_id = ca_o.call_id
AND ca_o.activity_type IN ('other-1', 'other2-2', ...)
LEFT JOIN call_activity ca_c
ON ca_s.call_id = ca_c.call_id
AND ca_s.activity_type IN ('completed-1', 'completed-2', ...)
WHERE ca_s.activity_type = 'started'
AND ca_c.call_id IS NULL --// no complete events
Upvotes: 0
Reputation: 54872
To do this the db has to at least find all the started calls and find if any completed activities exist for it. Assuming that not-completed is a small set, fetching the most recent activity can be done as a subquery. Here's a query that does this:
SELECT c_started.call_id, c_started.activity_id AS started_time,
(SELECT MAX(c_recent.activity_time)
FROM call_activity AS c_recent
WHERE c_recent.call_id = c_started.call_id) AS recent_activity
FROM call_activity AS c_started
LEFT JOIN call_activity AS c_completed
ON c_started.call_id = c_completed.call_id
AND c_completed.activity_type IN ('completed 1' 'completed 2', ...)
WHERE c_started.activity_type = 'started'
AND c_completed.call_id IS NULL;
If you can add indexes, the first choice would be a partial index on call_id where activity_type is in the completed events (the same check as in the join condition). Another would be an index on activity_type, possibly partial with only 'started' events to speed up the initial scan. Finally a call_id, activity_time index would speed up the subquery if you have lots of events per call. You can also get that if you reorder activity_type and activity_time in the primary key.
To make this fast, I would create an active_calls table with only a call_id column and add an insert trigger on call_activity to insert into active_calls if 'started' is inserted and delete if 'completed' is inserted.
Upvotes: 0