Reputation: 1804
I have a table of match ups in different games, and I would like to calculate how dense the matchup space in regards to each game is. Example table:
id | game | start_dt
---+-------+-----------------
1 | dota2 | 2020-01-01 15:00
---+-------+-----------------
2 | dota2 | 2020-01-01 15:05
---+-------+-----------------
3 | dota2 | 2020-01-01 18:00
---+-------+-----------------
4 | cs-go | 2020-01-01 13:05
---+-------+-----------------
5 | cs-go | 2020-01-01 13:15
---+-------+-----------------
6 | dota2 | 2020-01-01 12:00
---+-------+-----------------
7 | cs-go | 2020-01-01 14:45
Would ideally yield:
id | game | start_dt | time_group_id
---+-------+-----------------+---------------
6 | dota2 | 2020-01-01 12:00| 1
---+-------+-----------------+---------------
1 | dota2 | 2020-01-01 15:00| 2
---+-------+-----------------+---------------
2 | dota2 | 2020-01-01 15:05| 2
---+-------+-----------------+---------------
3 | dota2 | 2020-01-01 18:00| 3
---+-------+-----------------+---------------
4 | cs-go | 2020-01-01 13:05| 4
---+-------+-----------------+---------------
5 | cs-go | 2020-01-01 13:15| 4
---+-------+-----------------+---------------
7 | cs-go | 2020-01-01 14:45| 5
Which basically means, that if a gap between the next game and the previous one is less or equal to 10 minutes, they are considered in the same time group. Else they are different time groups and it proceeds.
Those time_group_id
s are then used to map useful information about matches and their time frequency.
My code is below, and it serves the purpose ideally, however, it doesn't give evenly spaced ids, so I have to use a composite of game VARCHAR
and group_id for the field to uniquely represent a group. Please, run it in the dbfiddle to understand, what I mean.
CREATE TABLE fight(
id BIGSERIAL PRIMARY KEY,
date TIMESTAMP NOT NULL,
game VARCHAR NOT NULL
);
INSERT INTO fight(date, game)
VALUES
('2020-01-01 15:00'::TIMESTAMP, 'dota2'),
('2020-01-01 15:05'::TIMESTAMP, 'dota2'),
('2020-01-01 18:00'::TIMESTAMP, 'dota2'),
('2020-01-01 13:05'::TIMESTAMP, 'cs-go'),
('2020-01-01 13:15'::TIMESTAMP, 'cs-go'),
('2020-01-01 12:00'::TIMESTAMP, 'dota2'),
('2020-01-01 14:45'::TIMESTAMP, 'cs-go');
SELECT * FROM fight;
CREATE SEQUENCE seq START 1 CACHE 1;
SELECT
a.id,
a.game,
a.start_dt,
(CASE WHEN (a.start_dt - INTERVAL '10 min' <= a.prev_start_dt) THEN currval('seq')
ELSE nextval('seq')
END)::VARCHAR || '|' || a.game AS time_group_id
FROM
(
SELECT
fight.id,
fight.game,
fight.date AS start_dt,
LAG (fight.date, 1, fight.date) OVER (PARTITION BY fight.game ORDER BY fight.date) AS prev_start_dt
FROM fight CROSS JOIN (SELECT setval('seq', 1)) s
) a
ORDER BY a.game, a.start_dt;
The question is: is there the ideal way to do this, or should I stick with what I got?
Upvotes: 1
Views: 122
Reputation: 1804
SELECT
b.id,
b.game,
b.start_dt,
sum(b.time_group_count) OVER (ORDER BY b.game, b.start_dt) as time_group_id
FROM
(SELECT
a.id,
a.game,
a.start_dt,
CASE WHEN a.prev_start_dt IS NULL THEN 1
WHEN (a.start_dt - INTERVAL '10 min' <= a.prev_start_dt) THEN 0
ELSE 1
END AS time_group_count
FROM
(
SELECT
fight.id,
fight.game,
fight.date AS start_dt,
LAG (fight.date, 1) OVER (PARTITION BY fight.game ORDER BY fight.date) AS prev_start_dt
FROM fight
) a
ORDER BY a.game, a.start_dt) b;
This query is what gave me the results I really wanted. Really grateful to the cumsum idea by @Gordon Linoff, thank you!
Upvotes: 0
Reputation: 1270301
You don't need a sequence for this, just a cumulative sum:
SELECT f.*,
COUNT(*) FILTER (WHERE prev_date < date - interval '10 min') OVER (ORDER BY date) as time_group_id
FROM (SELECT f.*,
LAG(f.date) OVER (PARTITION BY f.game ORDER BY f.date) AS prev_date
FROM fight f
) f;
Notes: This might start at 0
rather than 1
. If that makes a difference, use 1 +
.
This produces a number, not a string. You can convert to a string (using ::text
) if that is what you really need.
Here is a db<>fiddle
Upvotes: 1