Developer
Developer

Reputation: 2706

How to get Similar record with percentage in MySQL?

I have two tables. I want to get similar record with percentage.

Please look my code:

users

+----+------------+----------+
| id | fname      | lname    |
+----+------------+----------+
|  1 | sanjib     | pradhan  |
|  2 | aruna      | avipsa   |
|  3 | chinu      | sahu     |
|  4 | sradha     | behera   |
|  5 | debasish   | dash     |
|  6 | pikun      | behera   |
|  7 | pradep     | hardy    |
+----+------------+----------+

user_tags

+-----+---------+------------+
| id  | user_id | tag        |
+-----+---------+------------+
|  1  |  1      | php        |
|  2  |  1      | java       |
|  3  |  2      | java       |
|  4  |  1      | dotnet     |
|  5  |  2      | oracle     |
|  6  |  3      | oracle     |
|  7  |  3      | python     |
|  8  |  4      | sql        |
|  9  |  3      | java       |
+-----+---------+------------+

There have two table users and user_tags I need 3 records percentage. user_id 1 having 3 tags php, java, dotnet user_id 2 having 2 tags, user_id 3 having 3 tags and 4 having 1 tag.

Suppose my query based on id=2 in the users table. There have 2 tags on the tags table. I need these two tags with similar to other users tag.

user_id=2

tags=java,oracle


user_id=1

tags=php,java,dotnet (matches tag java is exist - percent should be number of matches tag/total tag) means 1/3


user_id=3

tags=oracle, python, java 2 tags matches(java and oracle) percent should be 2/3


user_id = 4, tag=sql (No matches tag so percent is 0%)

How to manage this by using MySQL and php code?

Upvotes: 0

Views: 321

Answers (1)

pala_
pala_

Reputation: 9010

We can get the answer you want by left joining all the users with their tags, to user 2 with their tags. Then we just count the appropriate values to get your percentage.

select u1.id, 
       count(ut2.tag) matches, 
       count(ut1.tag) total, 
       count(ut2.tag) / count(ut1.tag) pct
  from users u1
    inner join user_tags ut1
      on u1.id = ut1.user_id
    left join
      users u2
        inner join user_tags ut2
          on u2.id = ut2.user_id
    on ut2.tag = ut1.tag
      and u2.id = 2
  where u1.id <> 2
  group by u1.id

demo here

How does this work? Before the group by and aggregate functions, the rows returned look like this:

id  fname   lname   id  user_id tag     id      fname   lname   id      user_id tag
1   sanjib  pradhan 2   1       java    2       aruna   avipsa  3       2       java
3   chinu   sahu    9   3       java    2       aruna   avipsa  3       2       java
3   chinu   sahu    6   3       oracle  2       aruna   avipsa  5       2       oracle
1   sanjib  pradhan 1   1       php     (null)  (null)  (null)  (null)  (null)  (null)
1   sanjib  pradhan 4   1       dotnet  (null)  (null)  (null)  (null)  (null)  (null)
3   chinu   sahu    7   3       python  (null)  (null)  (null)  (null)  (null)  (null)
4   sradha  behera  8   4       sql     (null)  (null)  (null)  (null)  (null)  (null)

As you can see - where the tag for the user on the left does not match the tag for the user on the right, null values are entered. The count function in mysql ignores null values, and does not count them. Because of this, when we group by u1.id (the id value at the far left) and count the number of tags they have (ut1.tag), we get the total of all of their tags. However, when we count the value for ut2.tag, we only get the total that aren't null, ie, only the ones that match. This allows us to get your percentage.

edit for comment

You added the requirement in the comments to be able to compare against multiple users at once, since this means it may match the same tag more than once, we need to only count the distinct elements.

select u1.id, 
       count(distinct ut2.tag) matches, 
       count(ut1.tag) total, 
       count(distinct ut2.tag) / count(ut1.tag) pct
  from users u1
    inner join user_tags ut1
      on u1.id = ut1.user_id
    left join
      users u2
        inner join user_tags ut2
          on u2.id = ut2.user_id
    on ut2.tag = ut1.tag
      and u2.id IN (2,3)
  where u1.id NOT IN (2,3)
  group by u1.id
  having count(ut2.tag) > 0;

With your updated fiddle here

Upvotes: 3

Related Questions