Spence
Spence

Reputation: 125

Finding the total time a match was standing in postgresql

I have a database that has a table called matchstats which includes a column called time and it is updated each time an action takes place. I also have a column called groundstatsid which when it is not null means the action took place on the ground as opposed to standing. Finally I have a column called Round.

Example:

Time | groundstatsid | Round

 1   | NULL          |  1
 8   | NULL          |  1
 15  | NULL          |  1
 18  | 1             |  1
 20  | 1             |  1
 22  | NULL          |  1
 30  | NULL          |  1
 1   | NULL          |  2

To get the full time standing I would basically want the query to take the first time (1) and store that, then look at groundstatsid until it sees a NON NULL value and take the time at that position, subtract by the earlier number stored to get the time in standup (17). Then it would continue to look for where groundstatsid IS NULL. Once it finds that value it should do the same process of looking until it finds a NON NULL value in groundstatsid or a new round, in which case it will start the whole process again.

Once it has gone through an entire match I would want it to Sum the results.

I would expect the query of the example to return 25.

Upvotes: 2

Views: 206

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656744

This will calculate standing time for any number of rounds:

SELECT round, sum(down_time - up_time) AS standing_time
FROM  (
   SELECT round, grp, standing, min(time) AS up_time
         ,CASE WHEN standing THEN 
             lead(min(time), 1, max(time)) OVER (PARTITION BY round
                                                 ORDER BY min(time))
          ELSE NULL END AS down_time
   FROM  (
      SELECT round, time, groundstatsid IS NULL AS standing
            ,count(groundstatsid) OVER (PARTITION BY round
                                        ORDER BY time) AS grp
      FROM tbl
      ) x
   GROUP BY 1, 2, standing
   ) y
WHERE  standing
GROUP  BY round
ORDER  BY round;

-> sqlfiddle

Explain

  • Subquery x:

    Exploit the fact that count() doesn't count NULL values (neither as aggregate nor as window function). Successive rows with "standing" action (groundstatsid IS NULL) end up with the same value for grp.

    Simplify groundstatsid to a boolean var standing, for ease of use and elegance.

  • Subquery y:

    Aggregate per group - standing time matters. From ground time we only need the first row after each standing phase.

    Take the minimum time per group as up_time (standing up)

    Take the time from the following row (lead(min(time) ...) as down_time (going on the ground). Note that you can use aggregated values in a window function:

    lead(min(time), 1, max(time)) OVER ... takes the next min(time) per round an defaults to max(time) of the current row if the round is over (no next row).

  • Final SELECT:

    Only take standing time into account: WHERE groundstatsid IS NULL

    sum(down_time - up_time) aggregates the total standing time per round.

    Result ordered per round. Voilá.

This makes heavy use of window functions. Needs PostgreSQL 8.4 or later.
You could do the same procedurally in a plpgsql function if performance is your paramount requirement.
Examples here or here.

Upvotes: 1

willglynn
willglynn

Reputation: 11520

I would boil this problem down one where you consider pairs of rows, sorted by time within each round. PostgreSQL can do this in one pass -- no JOINs, no PL/pgSQL -- using window functions:

SELECT
  round,
  first_value(time) OVER pair AS first_time,
  last_value(time) OVER pair AS last_time,
  first_value(groundstatsid IS NULL) OVER pair AS first_is_standing,
  last_value(groundstatsid IS NULL) OVER pair AS last_is_standing
FROM matchstats
WINDOW pair AS (PARTITION BY round ORDER BY time ROWS 1 PRECEDING);

This tells PostgreSQL to read the rows from the table (presumably constrained by WHERE fightid=? or something), but to consider each round separately for windowing operations. Window functions like first_value and last_value can access the "window", which I specified to be ORDER BY time ROWS 1 PRECEDING, meaning the window contains both the current row and the one immediately preceding it in time (if any). Thus, window functions let us directly output values for both the current row and its predecessor.

For the data you provided, this query yields:

 round | first_time | last_time | first_is_standing | last_is_standing 
-------+------------+-----------+-------------------+------------------
     1 |          1 |         1 | t                 | t
     1 |          1 |         8 | t                 | t
     1 |          8 |        15 | t                 | t
     1 |         15 |        18 | t                 | f
     1 |         18 |        20 | f                 | f
     1 |         20 |        22 | f                 | t
     1 |         22 |        30 | t                 | t
     2 |          1 |         1 | t                 | t

Looking at these results helped me decide what to do next. Based on my understanding of your logic, I conclude that the person should be regarded as standing from time 1..1, 1..8, 8..15, 15..18, not standing from 18..20, not standing from 20..22, and is standing again from 22..30. In other words, we want to sum the difference between first_time and last_time where first_is_standing is true. Turning that back into SQL:

SELECT round, SUM(last_time - first_time) AS total_time_standing
FROM (
  SELECT
    round,
    first_value(time) OVER pair AS first_time,
    last_value(time) OVER pair AS last_time,
    first_value(groundstatsid IS NULL) OVER pair AS first_is_standing,
    last_value(groundstatsid IS NULL) OVER pair AS last_is_standing
  FROM matchstats
  WINDOW pair AS (PARTITION BY round ORDER BY time ROWS 1 PRECEDING)
) pairs
WHERE first_is_standing
GROUP BY round;
 round | total_time_standing 
-------+---------------------
     1 |                  25
     2 |                   0

You could also get other values from this same inner query, like the total time or the number of falls by using SUM(CASE WHEN ...) to count independent conditions:

SELECT
  round,
  SUM(CASE WHEN first_is_standing THEN last_time - first_time ELSE 0 END) AS total_time_standing,
  SUM(CASE WHEN first_is_standing AND NOT last_is_standing THEN 1 ELSE 0 END) AS falls,
  SUM(last_time - first_time) AS total_time
FROM (
  SELECT
    round,
    first_value(time) OVER pair AS first_time,
    last_value(time) OVER pair AS last_time,
    first_value(groundstatsid IS NULL) OVER pair AS first_is_standing,
    last_value(groundstatsid IS NULL) OVER pair AS last_is_standing
  FROM matchstats
  WINDOW pair AS (PARTITION BY round ORDER BY time ROWS 1 PRECEDING)
) pairs
GROUP BY round;

 round | total_time_standing | falls | total_time 
-------+---------------------+-------+------------
     1 |                  25 |     1 |         29
     2 |                   0 |     0 |          0

Upvotes: 4

Related Questions