Reputation: 13
I have a table in sql server 2012 that have this columns: user_id , merchant_id
I want to find top 5 similar partners for each merchant. The similarity is simply defined by normalized number of overlapping costumers;
i can not find any solution for this problem.
Upvotes: 1
Views: 160
Reputation: 1270431
The following query counts the number of common customers for two merchants:
select t.merchantid as m1, t2.merchantid as m2, count(*) as common_customers
from table t join
table t2
on t.customerid = t2.customerid and t.merchantid <> t2.merchantid
group by t.merchantid, t2.merchantid;
The following gets the five based on the raw couns:
select *
from (select t.merchantid as m1, t2.merchantid as m2, count(*) as common_customers,
row_number() over (partition by t.merchantid order by count(*) desc) as seqnum
from table t join
table t2
on t.customerid = t2.customerid and t.merchantid <> t2.merchantid
group by t.merchantid, t2.merchantid
) mm
where seqnum <= 5;
I do not know what you mean by "normalized". The term "normalized" in statistics would often not change the ordering of values (but would result in the sum of the squares being 1), so this may do what you want.
Upvotes: 2