Reputation: 2333
I have a dataframe as follows:
project_id member_id
1 A
1 B
1 C
2 A
2 D
2 B
I want to find all pairs of people that have worked together on at least one project. So the resulting dataframe should look like this:
member_id co_member_id
A B
A C
A D
B A
B C
B D
C A
C B
D A
D B
One way I could think of is to df.groupby('project_id')
but then I would have to calculate a pairwise permutation of every possible unique value within each project_id
, then drop any duplicate pairings in the resulting df.
I was wondering if there was a more efficient way to do this.
Upvotes: 0
Views: 223
Reputation: 4004
A great answer above by jp_data_analysis. However, you are loosing the information about the projects, which may or may not be always desired. The code below returns all the information in three lines without any explicit loops.
import pandas as pd
# Create data frame
project_id = [1, 1, 1, 2, 2, 2]
member_id = ['A', 'B', 'C', 'A', 'D', 'B']
df = pd.DataFrame({'project_id': project_id, 'member_id': member_id})
# New data frame with co_member_id
df1 = pd.merge(df, df, how='inner', on=['project_id'])
df1 = df1[df1['member_id_x'] != df1['member_id_y']]
df1.columns = ['member_id', 'project_id', 'co_member_id']
print(df1)
member_id project_id co_member_id
1 A 1 B
2 A 1 C
3 B 1 A
5 B 1 C
6 C 1 A
7 C 1 B
10 A 2 D
11 A 2 B
12 D 2 A
14 D 2 B
15 B 2 A
16 B 2 D
A multi-index and groupby gives you a very succinct result:
df3 = df1.set_index(['member_id', 'co_member_id'])
df3 = df.groupby('project_id').sum()
print(df3)
member_id
project_id
1 ABC
2 ADB
Upvotes: 2
Reputation: 12679
You can try something like this:
project_id = [1, 1, 1, 2, 2, 2]
member_id = ['A', 'B', 'C', 'A', 'D', 'B']
import itertools
track={}
combination=[]
for i in zip(project_id,member_id):
if i[0] not in track:
track[i[0]]=[i[1]]
else:
track[i[0]].append(i[1])
[combination.append(k) for i,j in track.items() for k in itertools.permutations(j,r=2) if k not in combination]
print({m:list(l) for m,l in itertools.groupby(sorted(combination),lambda x:x[0])})
output:
{'A': [('A', 'B'), ('A', 'C'), ('A', 'D')], 'B': [('B', 'A'), ('B', 'C'), ('B', 'D')], 'C': [('C', 'A'), ('C', 'B')], 'D': [('D', 'A'), ('D', 'B')]}
Upvotes: 0
Reputation: 164753
This is a functional method which does not rely on pandas
:
from itertools import permutations
from collections import defaultdict
project_id = [1, 1, 1, 2, 2, 2]
member_id = ['A', 'B', 'C', 'A', 'D', 'B']
d = defaultdict(list)
# create dictionary of project -> members
for i, j in zip(project_id, member_id):
d[i].append(j)
# permute pairs and get union
set.union(*(set(permutations(v, 2)) for v in d.values()))
# {('A', 'B'),
# ('A', 'C'),
# ('A', 'D'),
# ('B', 'A'),
# ('B', 'C'),
# ('B', 'D'),
# ('C', 'A'),
# ('C', 'B'),
# ('D', 'A'),
# ('D', 'B')}
Upvotes: 3