Reputation: 227
If I have a data set that has 2 columns user_id and their interests and I want to find users having common interests, how can I do that? For example, I will take the first user and his interests and compare it with all other user's common interests individually, then I will take the second user and compare his interests with all other user's interests and so on....
My data looks like:
userid interest
1 [A, B]
2 [A, C, B]
3 [B, D]
I am not sure how to do this-
for i in range(0,3):
for j in range(i+1, 3):
print((df['interest'].loc[i]).intersection(df['interest'].loc[j]))
My output should be-
userid relativeid common interest
1 2 [A, B]
1 3 [B]
2 3 [B]
Upvotes: 1
Views: 537
Reputation: 402553
Use a dictionary to perform lookup. You can then find combinations of "userid" using itertools.combinations
and then just perform set intersection for each "userid' list pair.
import itertools
m = df.set_index('userid')['interest'].map(set).to_dict()
m
# {1: {'A', 'B'}, 2: {'A', 'B', 'C'}, 3: {'B', 'D'}}
out = pd.DataFrame(
itertools.combinations(df.userid, 2), columns=['userid', 'relativeid'])
out['common_interest'] = [list(m[x] & m[y]) for x, y in out.values]
out
userid relativeid common_interest
0 1 2 [B, A]
1 1 3 [B]
2 2 3 [B]
Upvotes: 1
Reputation: 14226
Here is how I would solve it, it's possible someone has a fancier pandas
way.
from itertools import combinations
cs = combinations(df.userid.values, 2)
output = pd.DataFrame(list(cs), columns=['userid', 'relativeid'])
print(output)
userid relativeid
0 1 2
1 1 3
2 2 3
def intersect(row):
p1 = df.loc[df.userid == row['userid'], 'interest'].values[0]
p2 = df.loc[df.userid == row['relativeid'], 'interest'].values[0]
return list(set(p1).intersection(set(p2)))
output.assign(common_interest=output.apply(intersect, axis=1))
userid relativeid common_interest
0 1 2 [B, A]
1 1 3 [B]
2 2 3 [B]
Upvotes: 1