mysqlsqlstored-procedurescluster-computingranking

Reputation: 4676

How do I compute a ranking with MySQL stored procedures?

Let's assume we have this very simple table:

|class   |student|
---------------
 Math     Alice
 Math     Bob
 Math     Peter
 Math     Anne
 Music    Bob
 Music    Chis
 Music    Debbie
 Music    Emily
 Music    David
 Sports   Alice
 Sports   Chris
 Sports   Emily
 .
 .
 .

Now I want to find out, who I have the most classes in common with. So basically I want a query that gets as input a list of classes (some subset of all classes) and returns a list like:

|student |common classes|
 Brad     6
 Melissa  4
 Chris    3
 Bob      3
 .
 .
 .

What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.

(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)

UPDATE

Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy. Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).

SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC

But actually I simplified my problem a bit too much, so let's make a bit it harder:

Some classes are more important than others, so they are weighted:

| class   | importance |
  Music     0.8
  Math      0.7
  Sports    0.01
  English   0.5
  ...

Additionally, students can be more ore less important. (In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)

|student | importance |
 Bob       3.5
 Anne      4.2
 Chris     0.3
 ...

This means a simple COUNT() won't do it anymore. In order to find out who I have the most in common with, I want to do the following:

map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
     float myScoreForClassC = getMyScoreForClass(c);
     List students = getStudentsAttendingClass(c);
     foreach (Student s in students)
     {
          float studentScoreForClassC = c.classImportance*s.Importance;
          studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
     }
}

I hope it's not getting too confusing.

I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.

Upvotes: 2

Answers (4)

mathematical.coffee

Reputation: 56935

SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC

Update re your question update.

Assuming there's a table class_importance and student_importance as you describe above:

SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC

The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC) because I don't know how you calculate that.

Supposing you have another table myScores:

class | score
Math      10
Sports     0
Music     0.8
...

You can combine it all like this (see the extra LEAST inside the SUM):

SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
                         AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class                 -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC

If your myScores didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue).

Upvotes: 1

MatBailie

Reputation: 86775

Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.

SELECT
  otherStudents.Student,
  COUNT(*)   AS sharedClasses
FROM
  class_student_map          AS myClasses
INNER JOIN
  class_student_map          AS otherStudents
    ON  otherStudents.class    = myClasses.class
    AND otherStudents.student != myClasses.student
WHERE
  myClasses.student = 'Ben'
GROUP BY
  otherStudents.Student

EDIT

To follow up your edit, you just need to join on the new table and do your calculation.

Using the SQL example you gave in the edit...

SELECT
  classes_table.student,
  MIN(class_importance.importance * student_importance.importance) as rank
FROM
  classes_table
INNER JOIN
  class_important
    ON classes_table.class = class_importance.class
INNER JOIN
  student_important
    ON classes_table.student = student_importance.student
WHERE
  classes_table.class in (my_subject_list)
GROUP BY
  classes_table.student
ORDER BY
  2

Upvotes: 1

T I

Reputation: 9943

SELECT 
  tbl.student, 
  COUNT(tbl.class) AS common_classes
FROM
  tbl
WHERE tbl.class IN (SELECT 
                      sub.class 
                    FROM 
                      tbl AS sub 
                    WHERE 
                      (sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;

Upvotes: 2

defuz

Reputation: 27611

As I understand your question, you can simply run a query like this:

SELECT `student`, COUNT(`class`) AS `commonClasses` 
FROM `classes_to_students` 
WHERE `class` IN ('Math', 'Music', 'Sport') 
GROUP BY `student` 
ORDER BY `commonClasses` DESC

Upvotes: 1

How do I compute a ranking with MySQL stored procedures?

Answers (4)

Related Questions