Reputation: 9372

Improve SQL query performance

I have three tables where I store actual person data (person), teams (team) and entries (athlete). The schema of the three tables is:

Database schema

In each team there might be two or more athletes.

I'm trying to create a query to produce the most frequent pairs, meaning people who play in teams of two. I came up with the following query:

SELECT p1.surname, p1.name, p2.surname, p2.name, COUNT(*) AS freq
FROM person p1, athlete a1, person p2, athlete a2
WHERE
    p1.id = a1.person_id AND
    p2.id = a2.person_id AND
    a1.team_id = a2.team_id AND
    a1.team_id IN
          ( SELECT team.id
            FROM team, athlete
            WHERE team.id = athlete.team_id
            GROUP BY team.id
            HAVING COUNT(*) = 2 )
GROUP BY p1.id
ORDER BY freq DESC

Obviously this is a resource consuming query. Is there a way to improve it?

Upvotes: 2

Answers (5)

nativegrip

Reputation: 892

Here, Some tips to improve SQL select query performance like:

Use SET NOCOUNT ON it is help to decrease network traffic thus improve performance.
Use fully qualified procedure name (e.g. database.schema.objectname)
Use sp_executesql instead of execute for dynamic query
Don't use select * use select column1,column2,.. for IF EXISTS or SELECT operation
Avoid naming user Stored Procedure like sp_procedureName Becouse, If we use Stored Procedure name start with sp_ then SQL first search in master db. so it can down query performance.

Upvotes: 0

DRapp

Reputation: 48179

REVISION BASED on EXACTLY TWO PER TEAM

By the inner-most pre-aggregate of exactly TWO people, I can get each team with personA and PersonB to a single row per team using MIN() and MAX(). This way, the person's IDs will always be in low-high pair setup to be compared for future teams. Then, I can query the COUNT by the common Mate1 and Mate2 across ALL teams and directly get their Names.

SELECT STRAIGHT_JOIN
      p1.surname, 
      p1.name, 
      p2.surname, 
      p2.name, 
      TeamAggregates.CommonTeams
   from 
     ( select PreQueryTeams.Mate1,
              PreQueryTeams.Mate2,
              count(*) CommonTeams
           from
              ( SELECT team_id, 
                       min( person_id ) mate1, 
                       max( person_id ) mate2
                   FROM 
                       athlete 
                   group by 
                       team_id 
                   having count(*) = 2 ) PreQueryTeams
           group by
              PreQueryTeams.Mate1,
              PreQueryTeams.Mate2  ) TeamAggregates,
      person p1,
      person p2
   where
          TeamAggregates.Mate1 = p1.Person_ID
      and TeamAggregates.Mate2 = p2.Person_ID
   order by 
      TeamAggregates.CommonTeams

ORIGINAL ANSWER FOR TEAMS WITH ANY NUMBER OF TEAMMATES

I would do by the following. The inner prequery first joining all possible combinations of people on each individual team, but having person1 < person2 will eliminate counting the same person as person1 AND person2.. In addition, will prevent the reverse based on higher numbered person IDs... Such as

athlete   person   team
1         1        1   
2         2        1
3         3        1
4         4        1
5         1        2
6         3        2
7         4        2
8         1        3
9         4        3

So, from team 1 you would get person pairs of
1,2    1,3   1,4      2,3     2,4    3,4
and NOT get reversed duplicates such as 
2,1    3,1   4,1      3,2     4,2    4,3
nor same person
1,1    2,2   3,3   4,4 


Then from team 2, you would hav pairs of
1,3   1,4   3,4

Finally in team 3 the single pair of 
1,4

thus teammates 1,4 have occured in 3 common teams.

SELECT STRAIGHT_JOIN
      p1.surname, 
      p1.name, 
      p2.surname, 
      p2.name, 
      PreQuery.CommonTeams
   from 
      ( select
            a1.Person_ID Person_ID1,
            a2.Person_ID Person_ID2,
            count(*) CommonTeams
         from 
            athlete a1,
            athlete a2
         where
                a1.Team_ID = a2.Team_ID
            and a1.Person_ID < a2.Person_ID
         group by 
            1, 2
         having CommonTeams > 1 ) PreQuery,
      person p1,
      person p2
   where 
          PreQuery.Person_ID1 = p1.id
      and PreQuery.Person_ID2 = p2.id
   order by 
      PreQuery.CommonTeams

Upvotes: 0

Jon Black

Reputation: 16559

You might consider the following approach which uses triggers to maintain counters in your team and person tables so you can easily find out which teams have 2 or more athletes and which persons are in 2 or more teams.

(note: I've removed the surrogate id key from your athlete table in favour of a composite key which will better enforce data integrity. I've also renamed athlete to team_athlete)

drop table if exists person;
create table person
(
person_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
team_count smallint unsigned not null default 0
)
engine=innodb;

drop table if exists team;
create table team 
(
team_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
athlete_count smallint unsigned not null default 0,
key (athlete_count) 
)
engine=innodb;

drop table if exists team_athlete;
create table team_athlete
(
team_id int unsigned not null,
person_id int unsigned not null,
primary key (team_id, person_id), -- note clustered composite PK
key person(person_id) -- added index
)
engine=innodb;

delimiter #

create trigger team_athlete_after_ins_trig after insert on team_athlete
for each row
begin
  update team set athlete_count = athlete_count+1 where team_id = new.team_id;
  update person set team_count = team_count+1 where person_id = new.person_id;
end#

delimiter ;

insert into person (name) values ('p1'),('p2'),('p3'),('p4'),('p5');
insert into team (name) values ('t1'),('t2'),('t3'),('t4');

insert into team_athlete (team_id, person_id) values
(1,1),(1,2),(1,3),
(2,3),(2,4),
(3,1),(3,5);

select * from team_athlete;
select * from person;
select * from team;

select * from team where athlete_count >= 2;
select * from person where team_count >= 2;

EDIT

Added the following as initially misunderstood question:

Create a view which only includes teams of 2 persons.

drop view if exists teams_with_2_players_view;

create view teams_with_2_players_view as
select
 t.team_id,
 ta.person_id,
 p.name as person_name
from
 team t
inner join team_athlete ta on t.team_id = ta.team_id
inner join person p on ta.person_id = p.person_id
where
 t.athlete_count = 2;

Now use the view to find the most frequently occurring person pairs.

select 
 p1.person_id as p1_person_id,
 p1.person_name as p1_person_name,
 p2.person_id as p2_person_id,
 p2.person_name as p2_person_name,
 count(*) as counter
from
 teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on 
  p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
 p1.person_id, p2.person_id
order by
 counter desc;

Hope this helps :)

EDIT 2 checking performance

select count(*) as counter from person;

+---------+
| counter |
+---------+
|   10000 |
+---------+
1 row in set (0.00 sec)

select count(*) as counter from team;

+---------+
| counter |
+---------+
|  450000 |
+---------+
1 row in set (0.08 sec)

select count(*) as counter from team where athlete_count = 2;

+---------+
| counter |
+---------+
|  112644 |
+---------+
1 row in set (0.03 sec)

select count(*) as counter from team_athlete;

+---------+
| counter |
+---------+
| 1124772 |
+---------+
1 row in set (0.21 sec)

explain
select 
 p1.person_id as p1_person_id,
 p1.person_name as p1_person_name,
 p2.person_id as p2_person_id,
 p2.person_name as p2_person_name,
 count(*) as counter
from
 teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on 
  p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
 p1.person_id, p2.person_id
order by
 counter desc
limit 10;

+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
| id | select_type | table | type   | possible_keys       | key         | key_len | ref                 | rows  | Extra                                        |
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | t     | ref    | PRIMARY,t_count_idx | t_count_idx | 2  | const               | 86588 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY,t_count_idx | PRIMARY     | 4  | foo_db.t.team_id    |     1 | Using where                                  |
|  1 | SIMPLE      | ta    | ref    | PRIMARY,person      | PRIMARY     | 4  | foo_db.t.team_id    |     1 | Using index                                  |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY             | PRIMARY     | 4  | foo_db.ta.person_id |     1 |                                              |
|  1 | SIMPLE      | ta    | ref    | PRIMARY,person      | PRIMARY     | 4  | foo_db.t.team_id    |     1 | Using where; Using index                     |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY             | PRIMARY     | 4  | foo_db.ta.person_id |     1 |                                              |
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+

6 rows in set (0.00 sec)

select 
 p1.person_id as p1_person_id,
 p1.person_name as p1_person_name,
 p2.person_id as p2_person_id,
 p2.person_name as p2_person_name,
 count(*) as counter
from
 teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on 
  p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
 p1.person_id, p2.person_id
order by
 counter desc
limit 10;

+--------------+----------------+--------------+----------------+---------+
| p1_person_id | p1_person_name | p2_person_id | p2_person_name | counter |
+--------------+----------------+--------------+----------------+---------+
|          221 | person 221     |          739 | person 739     |       5 |
|          129 | person 129     |          249 | person 249     |       5 |
|          874 | person 874     |          877 | person 877     |       4 |
|          717 | person 717     |          949 | person 949     |       4 |
|          395 | person 395     |          976 | person 976     |       4 |
|          415 | person 415     |          828 | person 828     |       4 |
|          287 | person 287     |          470 | person 470     |       4 |
|          455 | person 455     |          860 | person 860     |       4 |
|           13 | person 13      |           29 | person 29      |       4 |
|            1 | person 1       |          743 | person 743     |       4 |
+--------------+----------------+--------------+----------------+---------+
10 rows in set (2.02 sec)

Upvotes: 2

Damodharan R

Reputation: 1507

Should there be an additional constraint a1.person_id != a2.person_id, to avoid creating a pair with the same player? This may not affect the final ordering of the results but will affect the accuracy of the count.

If possible you can add a column called athlete_count (with an index) in the team table which can be updated whenever a player is added or removed to a team and this can avoid the subquery which needs to go through the entire athlete table for finding the two player teams.

UPDATE1: Also, if I am understanding the original query correctly, when you group by p1.id you only get the number of times a player played in a two player team and not the count of the pair itself. You may have to Group BY p1.id, p2.id.

Upvotes: 0

ta.speot.is

Reputation: 27214

SELECT id
FROM team, athlete
WHERE team.id = athlete.team_id
GROUP BY team.id
HAVING COUNT(*) = 2

Performance Tip 1: You only need the athlete table here.

Upvotes: 4

Improve SQL query performance

Answers (5)

Related Questions