Sercan
Sercan

Reputation: 325

What kind of algorithm should I use to find similarities in a db?

Let's say I have 1000 users for my app. I ask them 100 questions with answers just yes/no and I record those answers in a seperate table.

Now, I want to see people who has given the same answers to at least 20 questions.

What kind of algorithm should I follow in order to do this? What are the relevant keywords for googling?

P.S. I work in a WAMP environment.

Upvotes: 0

Views: 57

Answers (1)

Matt Raines
Matt Raines

Reputation: 4218

Join your answers table to itself, selecting answers which share the same question_id and answer but have a different user_id. Group the rows by both user_ids and use a HAVING clause to exclude those with less than 20 matching answers.

Example where you are looking for users similar to your user with user_id "1":

SELECT DISTINCT a2.user_id FROM answers a
INNER JOIN answers a2
        ON a.question_id = a2.question_id
       AND a.answer = a2.answer
       AND a.user_id != a2.user_id
WHERE a.user_id = 1
GROUP BY a.user_id, a2.user_id
HAVING COUNT(*) >= 20;

Technically you don't need to group by a.user_id in this case but I've left it there in case you want to modify the WHERE clause to return results for more than one a.user_id.

Upvotes: 4

Related Questions