Reputation: 443
I have a database of people and projects. How can I find the names of people who collaborated with a given person, and on how many projects?
For example, I want to find the collaborators of Jimmy from the database:
+----------+--------+
| project | person |
+----------+--------+
| datamax | Jimmy |
| datamax | Ashley |
| datamax | Martin |
| cocoplus | Jimmy |
| cocoplus | Ashley |
| glassbox | Jimmy |
| glassbox | Martin |
| powerbin | Jimmy |
| powerbin | Ashley |
+----------+--------+
The result would look something like this:
Jimmy's collaborations:
+--------+----------------+
| person | collaborations |
+--------+----------------+
| Ashley | 3 |
| Martin | 2 |
+--------+----------------+
Upvotes: 0
Views: 231
Reputation: 21502
Join the table with itself, group by the person
field:
SELECT u2.person, COUNT(u1.project) AS collaborations
FROM users u1
JOIN users u2 ON u2.project = u1.project
WHERE u1.person != u2.person AND u1.person = 'Jimmy'
GROUP BY u2.person;
The query selects the projects in which Jimmy participated from u1
. The rows from u2
are filtered by the rows from u1
. Duplicate entries, where the users from both tables match, are filtered with WHERE
clause. Finally, the result set is grouped by person
, and the COUNT
function calculates the number of rows per group.
Performance
Note, an index for person
and project
columns (or two separate indexes) will significantly improve performance of the query above. Specific index configuration depends on the table structure. Although, I think the following is quite enough for a table with two varchar
fields for person
and project
, for instance:
ALTER TABLE users ADD INDEX `project` (`project`(10));
ALTER TABLE users ADD INDEX `person` (`person`(10));
Normalization
However, I would rather store persons and projects in separate tables with their numeric IDs. A third table could play the role of connector: person_id - project_id
. In other words, I recommend normalization. With normalized tables, you will not need to build bloated indexes for the text fields.
Normalized tables may look as follows:
CREATE TABLE users (
id int unsigned NOT NULL AUTO_INCREMENT,
name varchar(200) NOT NULL DEFAULT '',
PRIMARY KEY(`id`),
-- This index is needed, if you want to fetch users by names
INDEX name (name(8))
);
CREATE TABLE projects (
id int unsigned NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL DEFAULT '',
PRIMARY KEY(`id`)
);
CREATE TABLE collaborations (
project_id int unsigned NOT NULL DEFAULT 0,
user_id int unsigned NOT NULL DEFAULT 0,
PRIMARY KEY(`project_id`, `user_id`)
);
The query for the normalized structures will look a little bit more complex:
-- In practice, the user ID is retrieved from the calling process
-- (such as POST/GET HTTP requests, for instance).
SET @user_id := (SELECT id FROM users WHERE name LIKE 'Jimmy');
SELECT u.name person, COUNT(p.id) collaborations
FROM collaborations c
JOIN collaborations c2 USING(project_id)
JOIN users u ON u.id = c2.user_id
JOIN projects p ON p.id = c2.project_id
WHERE c.user_id = @user_id AND c.user_id != c2.user_id
GROUP BY c2.user_id;
But it will be fast, and the space required for the indexes will be significantly smaller, especially for large data sets.
Original answer
To fetch the total number of projects for each person, use COUNT
function with GROUP BY
clause:
SELECT person, COUNT(*) AS collaborations
FROM users
GROUP BY person;
Upvotes: 2