Reputation: 1777
I'm currently working on an app that matches users based on answered questions. I realized my algorithm in normal RoR and ActiveRecord queries but it's waaay to slow to use it. To match one user with 100 other users takes
Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms)
on my local machine. But still... I now want to realize this in raw SQL in order to gain some more performance. But I'm really having trouble getting my head around SQL queries inside of SQL queries and stuff like this plus calculations etc. My head is about to explode and I don't even know where to start.
Here's my algorithm:
def match(user)
@a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
@b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
if self.common_questions(user) == []
0.to_f
else
match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
i = 0
self.user_questions.select("question_id, importance").find_each do |n|
if user.user_questions.select(:id).find_by_question_id(n.question_id)
i += Importance.find_by_id(n.importance).value
end
end
return i
end
def actual_score(user)
i = 0
self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n|
@user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id)
unless @user_answer == nil
if n.accepted_answers.select(:answer_id).find_by_answer_id(@user_answer.answer_id)
i += Importance.find_by_id(n.importance).value
end
end
end
return i
end
So basically a user answers a questions, picks what answers he accepts and how important that question is to him. The algorithm then checks what questions 2 users have in common, if user1 gave an answer user2 accepts, if yes then the importance user2 gave for each question is added which makes up the score user1 made. Also the other way around for user2. Divided by the possible score gives the percentage and both percentages applied to the geometric mean gives me one total match percentage for both users. Fairly complicated I know. Tell if I didn't explain it good enough. I just hope I can express this in raw SQL. Performance is everything in this.
Here are my database tables:
CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded)
CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime);
CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer);
I guess the top of the SQL query has to look something like this?
SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match
FROM
But since I'm not an SQL master and can only do the usual stuff my head is about to explode. I hope someone can help me figure this out. Or atleast improve my performance somehow! Thanks so much!
So based on Wizard's answer I've managed to get a nice SQL statement for "possible_score"
SELECT SUM(value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1
I've tried to get the "actual_score" with this but it didn't work. My database manager crashed when I executed this.
SELECT SUM(imp.value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
INNER JOIN accepted_answers as ON as.user_question_id = uq1.id AND as.answer_id = uq2.answer_id
WHERE uq1.user_id = 1
Okay I'm an idiot! I can't use "as" as an alias of course. Changed it to aa and it worked! W00T!
Upvotes: 1
Views: 781
Reputation: 1777
So here's my new match function. I couldn't put everything in one query yet, because SQLite doesn't support math functions. But as soon as I switch to MySQL I will put everything in just one query. All this already gave me a HUGE performance boost of:
Completed 200 OK in 528ms (Views: 116.5ms | ActiveRecord: 214.0ms)
to match one user with 100 other users. Quite good! I'll have to see how good it performs once I fill my database with 10k fake users. And extra cudos to "Wizard of Ogz" for pointing out my inefficient code!
tried it with only 1000 users, 10 to 100 UserQuestions each, and ...
Completed 200 OK in 104871ms (Views: 2146.0ms | ActiveRecord: 93780.5ms)
... boy did that take long! I will have to think of something to tackle this problem.
def match(user)
if self.common_questions(user) == []
0.to_f
else
@a_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id
WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
WHERE uq1.user_id = ?) AS ps1",user.id, self.id, user.id, self.id]).collect(&:match).first.to_f
@b_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id
WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
WHERE uq1.user_id = ?) AS ps1",self.id, user.id, self.id, user.id]).collect(&:match).first.to_f
match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
Upvotes: 0
Reputation: 12643
I know you were thinking about moving to a SQL solution, but there are some major performance improvements which can be made to your Ruby code which might eliminate the need to use hand-coded SQL. When optimizing your code it is often worth using a profiler to make sure you really know which parts are the problem. In your example I think some big improvements can be made by removing iterative code and database queries which are executed during each iteration!
Also, if you are using a recent version of ActiveRecord you can generate queries with subselects without the need to code any SQL. Of course it is important that you have proper indexes created for your database.
I'm making a lot of assumptions about your models and relationships based on what I can infer from your code. If I'm wrong let me know and I'll try to make some adjustments accordingly.
def match(user)
if self.common_questions(user) == []
0.to_f
else
# Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed.
@a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
@b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
# If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance.
# I.e. UserQuestion belongs_to :importance, and Importance has_many :user_questions.
# I'm assuming that user_questions represents join models between users and questions.
# I.e. User has_many :user_questions, and User has_many :questions, :through => :user_questions.
# Question has_many :user_questions, and Question has_many :users, :through => :user_questions
# From your code this seems like the logical setup. Let me know if my assumption is wrong.
self.user_questions.
joins(:importance). # Requires the relation between UserQuestion and Importance I described above
where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord
sum(:value) # I'm also assuming that the importances table has a `value` column.
end
def actual_score(user)
user_questions.
joins(:importance, :accepted_answers). # It looks like accepted_answers indicates an answers table
where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)).
sum(:value)
end
UserQuestion seems to be a super join model between User, Question, Answer and Importance. Here are the model relations relevant to the code (not including the has_many :through relations you could create). I think you probably have these already:
# User
has_many :user_questions
# UserQuestion
belongs_to :user
belongs_to :question
belongs_to :importance, :foreign_key => :importance # Maybe rename the column `importance` to `importance_id`
belongs_to :answer
# Question
has_many :user_questions
# Importance
has_many :user_questions
# Answer
has_many :user_questions
Upvotes: 1