Reputation: 6616
I have a PostgreSQL table that is mostly a bridge table but it also has some extra stuff.
Essentially it holds the information about players in a game. So we have a unique id for this instance of a player in a game. Then an id that is FK to game table, and an id that is FK to player table. There is also some other irrelevant stuff. Something like this:
Table players_games
| id | 12564
| player_id | 556
| game_id | 156184
What I want to do is find how many occurrences there are of a player playing with another. So, if player1 is in the same game as player2, they have played together once. There are 2+ players in a game.
So what I want to do is populate a new table, that holds three values: player_lo, player_hi, times_played.
And either have one row for each pair and the number of times they played, or if it ends up being more efficient, a row for each iteration and have the value set as 1 so these can be added together later, maybe distributed. So you might see something like:
p1, p2, 1
p1, p2, 1
And these get reduced later to:
p1, p2, 2
So I was wondering if there was some clever way to do this with SQL, or if there's SQL that can reduce my programming effort, before starting to write a slightly complex python script to do it.
Upvotes: 3
Views: 3122
Reputation: 44250
SET search_path='tmp';
DROP TABLE players_game CASCADE;
CREATE TABLE players_game
( game_id INTEGER NOT NULL
, player_id INTEGER NOT NULL
);
INSERT INTO players_game(game_id,player_id) VALUES
(1,100) ,(1,101) ,(2,100) ,(2,101)
,(3,100) ,(3,101) ,(4,102) ,(4,101)
;
WITH pair AS (
SELECT g1.player_id AS p1
, g2.player_id AS p2
FROM players_game g1
JOIN players_game g2 ON g1.game_id = g2.game_id
WHERE g1.player_id < g2.player_id
)
SELECT pa.p1 , pa.p2, COUNT(*) AS num_games
FROM pair pa
GROUP BY p1, p2
ORDER BY num_games DESC
;
Result:
SET
ERROR: table "players_game" does not exist
CREATE TABLE
INSERT 0 8
p1 | p2 | num_games
-----+-----+-----------
100 | 101 | 3
101 | 100 | 3
102 | 101 | 1
101 | 102 | 1
(4 rows)
Upvotes: 0
Reputation: 1269923
To do this, you need to do a self join on the player_games table. The first subquery is for the first player, and the second for the second player. The "first" player is the one with the lower player id.
select pg1.player_id as player1, pg2.player_id as player2, count(*) as num_games
from (select distinct game_id, player_id
from players_games pg
) pg1 join
(select distinct game_id, player_id
from players_games pg
) pg2
on pg1.game_id = pg2.game_id and
pg1.player_id < pg2.player_id
group by pg1.player_id, pg2.player_id
Note that the join condition uses a "<" on the player ids. This is to prevent counting duplicates (so players A,B are not also counted as B,A).
Also, I added a "distinct" in the inner subqueries just in case a single player might appear more than once for a given game. Perhaps this is not necessary. To be sure, you should have a unique index on the composite key game_id, player_id.
Upvotes: 3
Reputation: 20270
select p1, p2, count(*) from (
select
pg1.player_id as p1, pg1.game_id, pg2.player_id as p2
from
players_games pg1, players_games pg2
where
pg1.game_id = pg2.game_id and pg1.player_id != pg2.player_id
) foo
group by p1, p2
Note that this does a full join on players_games
so it can be very slow if the table is large. The key part is the group by
for getting the count.
Upvotes: 3