Reputation: 1904

MySQL - Count Values occurring between other values

I'd like to count how many occurrences of a value happen before a specific value

Below is my starting table

+-----------------+--------------+------------+
| Id              | Activity     |  Time      |
+-----------------+--------------+------------+
| 1               | Click        | 1392263852 |
| 2               | Error        | 1392263853 |
| 3               | Finish       | 1392263862 |
| 4               | Click        | 1392263883 |
| 5               | Click        | 1392263888 |
| 6               | Finish       | 1392263952 |
+-----------------+--------------+------------+

I'd like to count how many clicks happen before a finish happens. I've got a very roundabout way of doing it where I write a function to find the last finished activity and query the clicks between the finishes. Also repeat this for Error.

What I'd like to achieve is the below table

+-----------------+--------------+------------+--------------+------------+
| Id              | Activity     |  Time      | Clicks       | Error      |
+-----------------+--------------+------------+--------------+------------+
| 3               | Finish       | 1392263862 | 1            | 1          |
| 6               | Finish       | 1392263952 | 2            | 0          |
+-----------------+--------------+------------+--------------+------------+

This table is very long so I'm looking for an efficient solution. If anyone has any ideas.

Thanks heaps!

Upvotes: 2

Answers (4)

Andriy M

Reputation: 77707

Here's another method of using variables, which is somewhat different to @peterm's:

SELECT
  Id,
  Activity,
  Time,
  Clicks,
  Errors
FROM (
  SELECT
    t.*,
    @clicks := @clicks + (activity = 'Click') AS Clicks,
    @errors := @errors + (activity = 'Error') AS Errors,
    @clicks := @clicks * (activity <> 'Finish'),
    @errors := @errors * (activity <> 'Finish')
  FROM
    `starting` t
  CROSS JOIN
    (SELECT @clicks := 0, @errors := 0) i
  ORDER BY
    time
) AS s
WHERE Activity = 'Finish'
;

What's similar to Peter's query is that this one uses a subquery that's returning all the rows, setting some variables along the way and returning the variables' values as columns. That may be common to most methods that use variables, though, and that's where the similarity between these two queries ends.

The difference is in how the accumulated results are calculated. Here all the accumulation is done in the subquery, and the main query merely filters the derived dataset on Activity = 'Finish' to return the final result set. In contrast, the other query uses grouping and aggregation at the outer level to get the accumulated results, which may make it slower than mine in comparison.

At the same time Peter's suggestion is more easily scalable in terms of coding. If you happen to have to extend the number of activities to account for, his query would only need expansion in the form of adding one SUM(activity = '...') AS ... per new activity to the outer SELECT, whereas in my query you would need to add a variable and several expressions, as well as a column in the outer SELECT, per every new activity, which would bloat the resulting code much more quickly.

Upvotes: 0

Brian DeMilia

Reputation: 13248

Try:

select      x.id
            , x.activity
            , x.time
            , sum(case when y.activity = 'Click' then 1 else 0 end) as clicks
            , sum(case when y.activity = 'Error' then 1 else 0 end) as errors
from        tbl x, tbl y
where       x.activity = 'Finish'
        and y.time < x.time
        and (y.time > (select max(z.time) from tbl z where z.activity = 'Finish' and z.time < x.time)
         or x.time = (select min(z.time) from tbl z where z.activity = 'Finish'))
group by    x.id
            , x.activity
            , x.time
order by    x.id

Upvotes: 0

peterm

Reputation: 92805

A version that leverages user(session) variables

SELECT MAX(id) id,
       MAX(activity) activity,
       MAX(time) time,
       SUM(activity = 'Click') clicks,
       SUM(activity = 'Error') error
  FROM
(
  SELECT t.*, @g := IF(activity <> 'Finish' AND @a = 'Finish', @g + 1, @g) g, @a := activity
    FROM table1 t CROSS JOIN (SELECT @g := 0, @a := NULL) i
   ORDER BY time
) q
 GROUP BY g

Output:

| ID | ACTIVITY |       TIME | CLICKS | ERROR |
|----|----------|------------|--------|-------|
|  3 |   Finish | 1392263862 |      1 |     1 |
|  6 |   Finish | 1392263952 |      2 |     0 |

Here is SQLFiddle demo

Upvotes: 1

Gordon Linoff

Reputation: 1270401

This is a complicated problem. Here is an approach to solving it. The groups between the "finish" records need to be identified as being the same, by assigning a group identifier to them. This identifier can be calculated by counting the number of "finish" records with a larger id.

Once this is assigned, your results can be calculated using an aggregation.

The group identifier can be calculated using a correlated subquery:

select max(id) as id, 'Finish' as Activity, max(time) as Time,
       sum(Activity = 'Clicks') as Clicks, sum(activity = 'Error') as Error
from (select s.*,
             (select sum(s2.activity = 'Finish')
              from starting s2
              where s2.id >= s.id
             ) as FinishCount
      from starting s
     ) s
group by FinishCount;

Upvotes: 1

MySQL - Count Values occurring between other values

Answers (4)

Related Questions