Reputation: 1777
EDIT: I know it's been over a year, but I finally got something new to this problem. To see an update for this look at this question: Rails 3 user matching-algorithm to SQL Query (COMPLICATED)
I'm working on a site where users are matched based on answered questions.
The match percentage is calculated each time a user, for example, visits another users profile page. So the matching percentage is not stored in the database and is recalculated all the time.
Now I want to build in a search where users can search for their best match.
The question I have is, what is the most efficient way to do this?
What if I have 50k users and I have to list them ordered by match percentages. Do I have to calculate each matching percentage between one and the other 50k users and then create a list out of that? Sounds kind of inefficient to me. Wouldn't that slow down the app drastically?
I hope someone can help me with this, because this gives me kind of a headache.
EDIT: To clear things up a bit, here is my database model for user, questions, answers, user_answers and accepted_answers:
Tables:
Users(:id, :username, etc.)
Questions(:id, :text)
Answers(:id, :question_id, :text)
UserAnswers(:id, :user_id, :question_id, :answer_id, :importance)
AcceptedAnswers(:id, :user_answer_id, :answer_id)
Questions <-> Answers: one-to-many
Questions <-> UserAnswers: one-to-many
Users <-> UserAnswers: one-to-many
UserAnswers <-> AcceptableAnswers: one-to-many
So there is a list of Questions(with possible answers to this question) and Users give their "UserAnswers" to those questions, assign how important that question is to them and what answers they accept from other users.
Then if you take User1 and User2, you look for common answered questions, so UserAnswers where the question_id is the same. They have 10 questions in common. User1 gave the importance value 10 to the first five questions and the importance value 20 to the other five. User 2 gave acceptable answers to two 20 value and three 10 value questions. A total of 70 points. The highest reachable pointscore is of course 20x5 + 10x5... So User2 reached 70/150 * 100 = 46,66% ... The same thing is done the other way around for how much User1 reached of User2's assigned points to those questions. Those 2 percentages are then combined through the geometric mean: sqrt of percentage1 * percentage2 ... this gives the final match percentage
Upvotes: 4
Views: 1285
Reputation: 10907
@Wassem's answer seems on spot to your problem. I would also suggest you take an approach where percentages are updated on new answers and new accepted answers.
I have created a db only solution(gist), which would work but has an additional complexity of an intermediate table.
Ideally you should create two more tables, one for importance and another for percentage matches. You should create/insert/delete rows in these tables when user assigns/updates importance to an answer or marks some answer as acceptable. You can also leverage delayed_job or rescue to update the tables in background on the particular actions.
You may need to run the sqls once in while to sync up the data in the two new tables as there can be inconsistencies arising due to concurrency and also due to ordering of update actions in certain cases.
Updates on a accepted answer should be straight forward as you only need to update one pair. But in case somebody assigns importance to a question, there can be a lot calculations and a lot of percentages might need updation. To avoid this you might chose to only maintain the table with sums of importance for each pair, update it when required and calculate actual percentages on the fly(in db off-course).
Upvotes: 1
Reputation: 8402
I suggest you keep the match percentage of all the users in your database. Create a table matches
that has match percentage for a pair of users. You do not need to save match percentage for all the pairs of users in your database. A valid match percentage is calculated for two users only when any one of have them has accepted an answer from other user. Most of the users will not accept the answers of most of other users.
I will suggest you to calculate and save the match percentage not at the time when a user visits another users profile. But when a user accepts another users answers. This will make sure that you do not make any unnecessary calculation and match percentage for a pair of users is always fresh.
Upvotes: 1