Reputation: 4676
Let's assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with. So basically I want a query that gets as input a list of classes (some subset of all classes) and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.
(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)
UPDATE
Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy. Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let's make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important. (In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won't do it anymore. In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it's not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.
Upvotes: 2
Views: 1491
Reputation: 56935
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
Update re your question update.
Assuming there's a table class_importance
and student_importance
as you describe above:
SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC)
because I don't know how you calculate that.
Supposing you have another table myScores
:
class | score
Math 10
Sports 0
Music 0.8
...
You can combine it all like this (see the extra LEAST
inside the SUM
):
SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
If your myScores
didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue)
.
Upvotes: 1
Reputation: 86775
Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.
SELECT
otherStudents.Student,
COUNT(*) AS sharedClasses
FROM
class_student_map AS myClasses
INNER JOIN
class_student_map AS otherStudents
ON otherStudents.class = myClasses.class
AND otherStudents.student != myClasses.student
WHERE
myClasses.student = 'Ben'
GROUP BY
otherStudents.Student
EDIT
To follow up your edit, you just need to join on the new table and do your calculation.
Using the SQL example you gave in the edit...
SELECT
classes_table.student,
MIN(class_importance.importance * student_importance.importance) as rank
FROM
classes_table
INNER JOIN
class_important
ON classes_table.class = class_importance.class
INNER JOIN
student_important
ON classes_table.student = student_importance.student
WHERE
classes_table.class in (my_subject_list)
GROUP BY
classes_table.student
ORDER BY
2
Upvotes: 1
Reputation: 9943
SELECT
tbl.student,
COUNT(tbl.class) AS common_classes
FROM
tbl
WHERE tbl.class IN (SELECT
sub.class
FROM
tbl AS sub
WHERE
(sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;
Upvotes: 2
Reputation: 27611
As I understand your question, you can simply run a query like this:
SELECT `student`, COUNT(`class`) AS `commonClasses`
FROM `classes_to_students`
WHERE `class` IN ('Math', 'Music', 'Sport')
GROUP BY `student`
ORDER BY `commonClasses` DESC
Upvotes: 1