BKS
BKS

Reputation: 2333

generate every possible permutation

I have a dataframe as follows:

project_id    member_id
1             A
1             B
1             C
2             A
2             D
2             B

I want to find all pairs of people that have worked together on at least one project. So the resulting dataframe should look like this:

member_id    co_member_id
A            B
A            C
A            D
B            A
B            C
B            D
C            A
C            B
D            A
D            B

One way I could think of is to df.groupby('project_id') but then I would have to calculate a pairwise permutation of every possible unique value within each project_id , then drop any duplicate pairings in the resulting df.

I was wondering if there was a more efficient way to do this.

Upvotes: 0

Views: 223

Answers (3)

KRKirov
KRKirov

Reputation: 4004

A great answer above by jp_data_analysis. However, you are loosing the information about the projects, which may or may not be always desired. The code below returns all the information in three lines without any explicit loops.

import pandas as pd

# Create data frame
project_id = [1, 1, 1, 2, 2, 2]
member_id = ['A', 'B', 'C', 'A', 'D', 'B']
df = pd.DataFrame({'project_id': project_id, 'member_id': member_id})

# New data frame with co_member_id
df1 = pd.merge(df, df, how='inner', on=['project_id'])
df1 = df1[df1['member_id_x'] != df1['member_id_y']]
df1.columns = ['member_id', 'project_id', 'co_member_id']

print(df1)

   member_id  project_id co_member_id
1          A           1            B
2          A           1            C
3          B           1            A
5          B           1            C
6          C           1            A
7          C           1            B
10         A           2            D
11         A           2            B
12         D           2            A
14         D           2            B
15         B           2            A
16         B           2            D

A multi-index and groupby gives you a very succinct result:

df3 = df1.set_index(['member_id', 'co_member_id'])
df3 = df.groupby('project_id').sum()
print(df3)

           member_id
project_id          
1                ABC
2                ADB

Upvotes: 2

Aaditya Ura
Aaditya Ura

Reputation: 12679

You can try something like this:

project_id = [1, 1, 1, 2, 2, 2]
member_id = ['A', 'B', 'C', 'A', 'D', 'B']

import itertools
track={}
combination=[]
for i in zip(project_id,member_id):
    if i[0] not in track:
        track[i[0]]=[i[1]]
    else:
        track[i[0]].append(i[1])

[combination.append(k) for i,j in track.items() for k in itertools.permutations(j,r=2) if k not in combination]


print({m:list(l) for m,l in itertools.groupby(sorted(combination),lambda x:x[0])})

output:

{'A': [('A', 'B'), ('A', 'C'), ('A', 'D')], 'B': [('B', 'A'), ('B', 'C'), ('B', 'D')], 'C': [('C', 'A'), ('C', 'B')], 'D': [('D', 'A'), ('D', 'B')]}

Upvotes: 0

jpp
jpp

Reputation: 164753

This is a functional method which does not rely on pandas:

from itertools import permutations
from collections import defaultdict

project_id = [1, 1, 1, 2, 2, 2]
member_id = ['A', 'B', 'C', 'A', 'D', 'B']

d = defaultdict(list)

# create dictionary of project -> members
for i, j in zip(project_id, member_id):
    d[i].append(j)

# permute pairs and get union
set.union(*(set(permutations(v, 2)) for v in d.values()))

# {('A', 'B'),
#  ('A', 'C'),
#  ('A', 'D'),
#  ('B', 'A'),
#  ('B', 'C'),
#  ('B', 'D'),
#  ('C', 'A'),
#  ('C', 'B'),
#  ('D', 'A'),
#  ('D', 'B')}

Upvotes: 3

Related Questions