Ruslan  Galimov
Ruslan Galimov

Reputation: 256

Find the same sets of pairs

I have such scheme in mysql:

TableA (id integer PK, pid integer, mid integer)

Ex. data:

id  |  pid  |  mid  
1   |  2    |   2
2   |  2    |   4
3   |  3    |   4
4   |  4    |   2
5   |  4    |   4
6   |  3    |   2
7   |  3    |   5

I have pid with some mid's and want to find all pid's with the same set of mid's. In example for pid=2 answer is 2,4

group_concat is not suitable for me

I think it should be simple, but the answer eludes me

UPD: I have tried group_concat:

SELECT DISTINCT(b.pid) FROM (SELECT pid, group_concat(mid) as concated FROM TableA where pid=100293) as a, (select pid, group_concat(mid) as concated, COUNT(1) as count FROM TableA group by pid) as b where a.concated=b.concated;

Upvotes: 0

Views: 72

Answers (3)

JNevill
JNevill

Reputation: 50064

Since you are working with integers, instead of group_concat you could generate a bitmask on distinct mid values for each pid and join on that. Then it's just math all the way down:

SELECT DISTINCT pid 
FROM (SELECT pid, sum(pow(2,mid)) as midmask FROM (SELECT distinct pid, mid FROM tableA) as t1a GROUP BY pid) as t1
    INNER JOIN  (SELECT pid, sum(pow(2,mid)) as midmask FROM (SELECT distinct pid, mid FROM tableA) as t2a GROUP BY pid) as t2
        ON t1.midmask = t2.midmask

IF mid is already distinct for each pid then you can get rid of the inner-inner subqueries.

Using @GordonLinoff's excellent single-subquery approach where GROUP_CONCAT is only used on the main query (where it won't be so expensive). Instead of the group_concat on the inner query we use the bitmask approach that may be quicker.

SELECT midmask>>1, group_concat(pid) 
FROM (SELECT pid, sum(pow(2,mid)) as midmask FROM (SELECT distinct pid, mid FROM tableA) as t1a GROUP BY pid) as t1
GROUP BY midmask;

Results:

+---------+-------------------+
| midmask | group_concat(pid) |
+---------+-------------------+
|      10 |               2,4 |
|      26 |                 3 |
+---------+-------------------+

Obviously that midmask in the result set isn't super necessary, but you can pick out the values from the bitmask if you want to see the mid values that contributed to the match if you like.

I'm using the bit right-shift operator to insure that the proper bit is set in the midmask result otherwise you'll be off by one. If you don't care about the output of the midmask, then don't bother with the >>1 portion of the query.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269943

In MySQL, I would approach this using group_concat():

select mids, group_concat(pid)
from (select pid, group_concat(mid order by mid) as mids
      from t
      group by pid
     ) t
group by mids;

This solves the general problem, for all pids. Solving for 1 pid is a bit tricky in MySQL (no window functions), but you can try:

select t.pid, t2.pid, count(*)
from t join
     t t2
     on t.mid = t2.mid and t2.pid = 2
group by t.pid, t2.pid
having count(*) = (select count(*) from t where t.pid = t.pid) and
       count(*) = (select count(*) from t where t.pid = t2.pid);

For this, you want indexes on t(mid, pid) and t(pid).

Upvotes: 0

Ketan Patil
Ketan Patil

Reputation: 1272

You can use this query. It will give you comma separated pids.

select `mid`, group_concat(`pid`) from `tableA` group by `mid`;

Upvotes: 0

Related Questions