Reputation: 9372
I have three tables where I store actual person data (person
), teams (team
) and entries (athlete
). The schema of the three tables is:
In each team there might be two or more athletes.
I'm trying to create a query to produce the most frequent pairs, meaning people who play in teams of two. I came up with the following query:
SELECT p1.surname, p1.name, p2.surname, p2.name, COUNT(*) AS freq
FROM person p1, athlete a1, person p2, athlete a2
WHERE
p1.id = a1.person_id AND
p2.id = a2.person_id AND
a1.team_id = a2.team_id AND
a1.team_id IN
( SELECT team.id
FROM team, athlete
WHERE team.id = athlete.team_id
GROUP BY team.id
HAVING COUNT(*) = 2 )
GROUP BY p1.id
ORDER BY freq DESC
Obviously this is a resource consuming query. Is there a way to improve it?
Upvotes: 2
Views: 434
Reputation: 892
Here, Some tips to improve SQL select query performance like:
SET NOCOUNT ON
it is help to decrease network traffic thus
improve performance.database.schema.objectname
)sp_executesql
instead of execute
for dynamic queryselect *
use select column1,column2,..
for IF EXISTS
or SELECT
operationsp_procedureName
Becouse,
If we use Stored Procedure name start with sp_
then SQL first
search in master db. so it can down query performance.Upvotes: 0
Reputation: 48139
REVISION BASED on EXACTLY TWO PER TEAM
By the inner-most pre-aggregate of exactly TWO people, I can get each team with personA and PersonB to a single row per team using MIN() and MAX(). This way, the person's IDs will always be in low-high pair setup to be compared for future teams. Then, I can query the COUNT by the common Mate1 and Mate2 across ALL teams and directly get their Names.
SELECT STRAIGHT_JOIN
p1.surname,
p1.name,
p2.surname,
p2.name,
TeamAggregates.CommonTeams
from
( select PreQueryTeams.Mate1,
PreQueryTeams.Mate2,
count(*) CommonTeams
from
( SELECT team_id,
min( person_id ) mate1,
max( person_id ) mate2
FROM
athlete
group by
team_id
having count(*) = 2 ) PreQueryTeams
group by
PreQueryTeams.Mate1,
PreQueryTeams.Mate2 ) TeamAggregates,
person p1,
person p2
where
TeamAggregates.Mate1 = p1.Person_ID
and TeamAggregates.Mate2 = p2.Person_ID
order by
TeamAggregates.CommonTeams
ORIGINAL ANSWER FOR TEAMS WITH ANY NUMBER OF TEAMMATES
I would do by the following. The inner prequery first joining all possible combinations of people on each individual team, but having person1 < person2 will eliminate counting the same person as person1 AND person2.. In addition, will prevent the reverse based on higher numbered person IDs... Such as
athlete person team
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 3 2
7 4 2
8 1 3
9 4 3
So, from team 1 you would get person pairs of
1,2 1,3 1,4 2,3 2,4 3,4
and NOT get reversed duplicates such as
2,1 3,1 4,1 3,2 4,2 4,3
nor same person
1,1 2,2 3,3 4,4
Then from team 2, you would hav pairs of
1,3 1,4 3,4
Finally in team 3 the single pair of
1,4
thus teammates 1,4 have occured in 3 common teams.
SELECT STRAIGHT_JOIN
p1.surname,
p1.name,
p2.surname,
p2.name,
PreQuery.CommonTeams
from
( select
a1.Person_ID Person_ID1,
a2.Person_ID Person_ID2,
count(*) CommonTeams
from
athlete a1,
athlete a2
where
a1.Team_ID = a2.Team_ID
and a1.Person_ID < a2.Person_ID
group by
1, 2
having CommonTeams > 1 ) PreQuery,
person p1,
person p2
where
PreQuery.Person_ID1 = p1.id
and PreQuery.Person_ID2 = p2.id
order by
PreQuery.CommonTeams
Upvotes: 0
Reputation: 16559
You might consider the following approach which uses triggers to maintain counters in your team and person tables so you can easily find out which teams have 2 or more athletes and which persons are in 2 or more teams.
(note: I've removed the surrogate id key from your athlete table in favour of a composite key which will better enforce data integrity. I've also renamed athlete to team_athlete)
drop table if exists person;
create table person
(
person_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
team_count smallint unsigned not null default 0
)
engine=innodb;
drop table if exists team;
create table team
(
team_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
athlete_count smallint unsigned not null default 0,
key (athlete_count)
)
engine=innodb;
drop table if exists team_athlete;
create table team_athlete
(
team_id int unsigned not null,
person_id int unsigned not null,
primary key (team_id, person_id), -- note clustered composite PK
key person(person_id) -- added index
)
engine=innodb;
delimiter #
create trigger team_athlete_after_ins_trig after insert on team_athlete
for each row
begin
update team set athlete_count = athlete_count+1 where team_id = new.team_id;
update person set team_count = team_count+1 where person_id = new.person_id;
end#
delimiter ;
insert into person (name) values ('p1'),('p2'),('p3'),('p4'),('p5');
insert into team (name) values ('t1'),('t2'),('t3'),('t4');
insert into team_athlete (team_id, person_id) values
(1,1),(1,2),(1,3),
(2,3),(2,4),
(3,1),(3,5);
select * from team_athlete;
select * from person;
select * from team;
select * from team where athlete_count >= 2;
select * from person where team_count >= 2;
EDIT
Added the following as initially misunderstood question:
Create a view which only includes teams of 2 persons.
drop view if exists teams_with_2_players_view;
create view teams_with_2_players_view as
select
t.team_id,
ta.person_id,
p.name as person_name
from
team t
inner join team_athlete ta on t.team_id = ta.team_id
inner join person p on ta.person_id = p.person_id
where
t.athlete_count = 2;
Now use the view to find the most frequently occurring person pairs.
select
p1.person_id as p1_person_id,
p1.person_name as p1_person_name,
p2.person_id as p2_person_id,
p2.person_name as p2_person_name,
count(*) as counter
from
teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on
p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
p1.person_id, p2.person_id
order by
counter desc;
Hope this helps :)
EDIT 2 checking performance
select count(*) as counter from person;
+---------+
| counter |
+---------+
| 10000 |
+---------+
1 row in set (0.00 sec)
select count(*) as counter from team;
+---------+
| counter |
+---------+
| 450000 |
+---------+
1 row in set (0.08 sec)
select count(*) as counter from team where athlete_count = 2;
+---------+
| counter |
+---------+
| 112644 |
+---------+
1 row in set (0.03 sec)
select count(*) as counter from team_athlete;
+---------+
| counter |
+---------+
| 1124772 |
+---------+
1 row in set (0.21 sec)
explain
select
p1.person_id as p1_person_id,
p1.person_name as p1_person_name,
p2.person_id as p2_person_id,
p2.person_name as p2_person_name,
count(*) as counter
from
teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on
p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
p1.person_id, p2.person_id
order by
counter desc
limit 10;
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
| 1 | SIMPLE | t | ref | PRIMARY,t_count_idx | t_count_idx | 2 | const | 86588 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | t | eq_ref | PRIMARY,t_count_idx | PRIMARY | 4 | foo_db.t.team_id | 1 | Using where |
| 1 | SIMPLE | ta | ref | PRIMARY,person | PRIMARY | 4 | foo_db.t.team_id | 1 | Using index |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | foo_db.ta.person_id | 1 | |
| 1 | SIMPLE | ta | ref | PRIMARY,person | PRIMARY | 4 | foo_db.t.team_id | 1 | Using where; Using index |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | foo_db.ta.person_id | 1 | |
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
6 rows in set (0.00 sec)
select
p1.person_id as p1_person_id,
p1.person_name as p1_person_name,
p2.person_id as p2_person_id,
p2.person_name as p2_person_name,
count(*) as counter
from
teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on
p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
p1.person_id, p2.person_id
order by
counter desc
limit 10;
+--------------+----------------+--------------+----------------+---------+
| p1_person_id | p1_person_name | p2_person_id | p2_person_name | counter |
+--------------+----------------+--------------+----------------+---------+
| 221 | person 221 | 739 | person 739 | 5 |
| 129 | person 129 | 249 | person 249 | 5 |
| 874 | person 874 | 877 | person 877 | 4 |
| 717 | person 717 | 949 | person 949 | 4 |
| 395 | person 395 | 976 | person 976 | 4 |
| 415 | person 415 | 828 | person 828 | 4 |
| 287 | person 287 | 470 | person 470 | 4 |
| 455 | person 455 | 860 | person 860 | 4 |
| 13 | person 13 | 29 | person 29 | 4 |
| 1 | person 1 | 743 | person 743 | 4 |
+--------------+----------------+--------------+----------------+---------+
10 rows in set (2.02 sec)
Upvotes: 2
Reputation: 1507
Should there be an additional constraint a1.person_id != a2.person_id, to avoid creating a pair with the same player? This may not affect the final ordering of the results but will affect the accuracy of the count.
If possible you can add a column called athlete_count (with an index) in the team table which can be updated whenever a player is added or removed to a team and this can avoid the subquery which needs to go through the entire athlete table for finding the two player teams.
UPDATE1: Also, if I am understanding the original query correctly, when you group by p1.id you only get the number of times a player played in a two player team and not the count of the pair itself. You may have to Group BY p1.id, p2.id.
Upvotes: 0
Reputation: 27214
SELECT id
FROM team, athlete
WHERE team.id = athlete.team_id
GROUP BY team.id
HAVING COUNT(*) = 2
Performance Tip 1: You only need the athlete
table here.
Upvotes: 4