Reputation: 1904
I'd like to count how many occurrences of a value happen before a specific value
Below is my starting table
+-----------------+--------------+------------+
| Id | Activity | Time |
+-----------------+--------------+------------+
| 1 | Click | 1392263852 |
| 2 | Error | 1392263853 |
| 3 | Finish | 1392263862 |
| 4 | Click | 1392263883 |
| 5 | Click | 1392263888 |
| 6 | Finish | 1392263952 |
+-----------------+--------------+------------+
I'd like to count how many clicks happen before a finish happens. I've got a very roundabout way of doing it where I write a function to find the last finished activity and query the clicks between the finishes. Also repeat this for Error.
What I'd like to achieve is the below table
+-----------------+--------------+------------+--------------+------------+
| Id | Activity | Time | Clicks | Error |
+-----------------+--------------+------------+--------------+------------+
| 3 | Finish | 1392263862 | 1 | 1 |
| 6 | Finish | 1392263952 | 2 | 0 |
+-----------------+--------------+------------+--------------+------------+
This table is very long so I'm looking for an efficient solution. If anyone has any ideas.
Thanks heaps!
Upvotes: 2
Views: 103
Reputation: 77707
Here's another method of using variables, which is somewhat different to @peterm's:
SELECT
Id,
Activity,
Time,
Clicks,
Errors
FROM (
SELECT
t.*,
@clicks := @clicks + (activity = 'Click') AS Clicks,
@errors := @errors + (activity = 'Error') AS Errors,
@clicks := @clicks * (activity <> 'Finish'),
@errors := @errors * (activity <> 'Finish')
FROM
`starting` t
CROSS JOIN
(SELECT @clicks := 0, @errors := 0) i
ORDER BY
time
) AS s
WHERE Activity = 'Finish'
;
What's similar to Peter's query is that this one uses a subquery that's returning all the rows, setting some variables along the way and returning the variables' values as columns. That may be common to most methods that use variables, though, and that's where the similarity between these two queries ends.
The difference is in how the accumulated results are calculated. Here all the accumulation is done in the subquery, and the main query merely filters the derived dataset on Activity = 'Finish'
to return the final result set. In contrast, the other query uses grouping and aggregation at the outer level to get the accumulated results, which may make it slower than mine in comparison.
At the same time Peter's suggestion is more easily scalable in terms of coding. If you happen to have to extend the number of activities to account for, his query would only need expansion in the form of adding one SUM(activity = '...') AS ...
per new activity to the outer SELECT, whereas in my query you would need to add a variable and several expressions, as well as a column in the outer SELECT, per every new activity, which would bloat the resulting code much more quickly.
Upvotes: 0
Reputation: 13248
Try:
select x.id
, x.activity
, x.time
, sum(case when y.activity = 'Click' then 1 else 0 end) as clicks
, sum(case when y.activity = 'Error' then 1 else 0 end) as errors
from tbl x, tbl y
where x.activity = 'Finish'
and y.time < x.time
and (y.time > (select max(z.time) from tbl z where z.activity = 'Finish' and z.time < x.time)
or x.time = (select min(z.time) from tbl z where z.activity = 'Finish'))
group by x.id
, x.activity
, x.time
order by x.id
Upvotes: 0
Reputation: 92805
A version that leverages user(session) variables
SELECT MAX(id) id,
MAX(activity) activity,
MAX(time) time,
SUM(activity = 'Click') clicks,
SUM(activity = 'Error') error
FROM
(
SELECT t.*, @g := IF(activity <> 'Finish' AND @a = 'Finish', @g + 1, @g) g, @a := activity
FROM table1 t CROSS JOIN (SELECT @g := 0, @a := NULL) i
ORDER BY time
) q
GROUP BY g
Output:
| ID | ACTIVITY | TIME | CLICKS | ERROR | |----|----------|------------|--------|-------| | 3 | Finish | 1392263862 | 1 | 1 | | 6 | Finish | 1392263952 | 2 | 0 |
Here is SQLFiddle demo
Upvotes: 1
Reputation: 1270401
This is a complicated problem. Here is an approach to solving it. The groups between the "finish" records need to be identified as being the same, by assigning a group identifier to them. This identifier can be calculated by counting the number of "finish" records with a larger id
.
Once this is assigned, your results can be calculated using an aggregation.
The group identifier can be calculated using a correlated subquery:
select max(id) as id, 'Finish' as Activity, max(time) as Time,
sum(Activity = 'Clicks') as Clicks, sum(activity = 'Error') as Error
from (select s.*,
(select sum(s2.activity = 'Finish')
from starting s2
where s2.id >= s.id
) as FinishCount
from starting s
) s
group by FinishCount;
Upvotes: 1