Reputation: 125
I have a database that has a table called matchstats which includes a column called time and it is updated each time an action takes place. I also have a column called groundstatsid which when it is not null means the action took place on the ground as opposed to standing. Finally I have a column called Round.
Example:
Time | groundstatsid | Round
1 | NULL | 1
8 | NULL | 1
15 | NULL | 1
18 | 1 | 1
20 | 1 | 1
22 | NULL | 1
30 | NULL | 1
1 | NULL | 2
To get the full time standing I would basically want the query to take the first time (1) and store that, then look at groundstatsid until it sees a NON NULL value and take the time at that position, subtract by the earlier number stored to get the time in standup (17). Then it would continue to look for where groundstatsid IS NULL. Once it finds that value it should do the same process of looking until it finds a NON NULL value in groundstatsid or a new round, in which case it will start the whole process again.
Once it has gone through an entire match I would want it to Sum the results.
I would expect the query of the example to return 25.
Upvotes: 2
Views: 206
Reputation: 656744
This will calculate standing time for any number of rounds:
SELECT round, sum(down_time - up_time) AS standing_time
FROM (
SELECT round, grp, standing, min(time) AS up_time
,CASE WHEN standing THEN
lead(min(time), 1, max(time)) OVER (PARTITION BY round
ORDER BY min(time))
ELSE NULL END AS down_time
FROM (
SELECT round, time, groundstatsid IS NULL AS standing
,count(groundstatsid) OVER (PARTITION BY round
ORDER BY time) AS grp
FROM tbl
) x
GROUP BY 1, 2, standing
) y
WHERE standing
GROUP BY round
ORDER BY round;
Subquery x:
Exploit the fact that count()
doesn't count NULL
values (neither as aggregate nor as window function). Successive rows with "standing" action (groundstatsid IS NULL
) end up with the same value for grp
.
Simplify groundstatsid
to a boolean var standing
, for ease of use and elegance.
Subquery y:
Aggregate per group - standing time matters. From ground time we only need the first row after each standing phase.
Take the minimum time per group as up_time
(standing up)
Take the time
from the following row (lead(min(time) ...
) as down_time
(going on the ground). Note that you can use aggregated values in a window function:
lead(min(time), 1, max(time)) OVER ...
takes the next min(time)
per round an defaults to max(time)
of the current row if the round is over (no next row).
Final SELECT:
Only take standing time into account: WHERE groundstatsid IS NULL
sum(down_time - up_time)
aggregates the total standing time per round.
Result ordered per round. Voilá.
This makes heavy use of window functions. Needs PostgreSQL 8.4 or later.
You could do the same procedurally in a plpgsql function if performance is your paramount requirement.
Examples here or here.
Upvotes: 1
Reputation: 11520
I would boil this problem down one where you consider pairs of rows, sorted by time
within each round. PostgreSQL can do this in one pass -- no JOINs, no PL/pgSQL -- using window functions:
SELECT
round,
first_value(time) OVER pair AS first_time,
last_value(time) OVER pair AS last_time,
first_value(groundstatsid IS NULL) OVER pair AS first_is_standing,
last_value(groundstatsid IS NULL) OVER pair AS last_is_standing
FROM matchstats
WINDOW pair AS (PARTITION BY round ORDER BY time ROWS 1 PRECEDING);
This tells PostgreSQL to read the rows from the table (presumably constrained by WHERE fightid=?
or something), but to consider each round
separately for windowing operations. Window functions like first_value
and last_value
can access the "window", which I specified to be ORDER BY time ROWS 1 PRECEDING
, meaning the window contains both the current row and the one immediately preceding it in time (if any). Thus, window functions let us directly output values for both the current row and its predecessor.
For the data you provided, this query yields:
round | first_time | last_time | first_is_standing | last_is_standing
-------+------------+-----------+-------------------+------------------
1 | 1 | 1 | t | t
1 | 1 | 8 | t | t
1 | 8 | 15 | t | t
1 | 15 | 18 | t | f
1 | 18 | 20 | f | f
1 | 20 | 22 | f | t
1 | 22 | 30 | t | t
2 | 1 | 1 | t | t
Looking at these results helped me decide what to do next. Based on my understanding of your logic, I conclude that the person should be regarded as standing from time 1..1, 1..8, 8..15, 15..18, not standing from 18..20, not standing from 20..22, and is standing again from 22..30. In other words, we want to sum the difference between first_time
and last_time
where first_is_standing
is true. Turning that back into SQL:
SELECT round, SUM(last_time - first_time) AS total_time_standing
FROM (
SELECT
round,
first_value(time) OVER pair AS first_time,
last_value(time) OVER pair AS last_time,
first_value(groundstatsid IS NULL) OVER pair AS first_is_standing,
last_value(groundstatsid IS NULL) OVER pair AS last_is_standing
FROM matchstats
WINDOW pair AS (PARTITION BY round ORDER BY time ROWS 1 PRECEDING)
) pairs
WHERE first_is_standing
GROUP BY round;
round | total_time_standing
-------+---------------------
1 | 25
2 | 0
You could also get other values from this same inner query, like the total time or the number of falls by using SUM(CASE WHEN ...)
to count independent conditions:
SELECT
round,
SUM(CASE WHEN first_is_standing THEN last_time - first_time ELSE 0 END) AS total_time_standing,
SUM(CASE WHEN first_is_standing AND NOT last_is_standing THEN 1 ELSE 0 END) AS falls,
SUM(last_time - first_time) AS total_time
FROM (
SELECT
round,
first_value(time) OVER pair AS first_time,
last_value(time) OVER pair AS last_time,
first_value(groundstatsid IS NULL) OVER pair AS first_is_standing,
last_value(groundstatsid IS NULL) OVER pair AS last_is_standing
FROM matchstats
WINDOW pair AS (PARTITION BY round ORDER BY time ROWS 1 PRECEDING)
) pairs
GROUP BY round;
round | total_time_standing | falls | total_time
-------+---------------------+-------+------------
1 | 25 | 1 | 29
2 | 0 | 0 | 0
Upvotes: 4