Reputation: 45

Create list with all unique possible combination based on condition in dataframe in Python

I have the following dataset:

d = {
'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]
}

Now, I need to create a list in Python of all pairs of elements of 'Company', that correspond to the values in 'Individual'.

E.g. The output for above should be as follows for the dataset above: ((A,B),(A,C),(B,C),(C,D)).The first three tuples, since Individual 1 is affiliated with A,B and C and the last one since, Individual 10 is affiliated with C and D.

Further Explanation - If individual =1, the above dataset has 'A','B' and 'C' values. Now, I want to create all unique combination of these three values (tuple), therefore it should create a list with the tuples (A,B),(A,C) and (B,C). The next is Individual=2. Here is only has the value 'A' therefore there is no tuple to append to the list. For next individuals there's only one corresponding company each, hence no further pairs. The only other tuple that has to be added is for Individual=10, since it has values 'C' and 'D' - and should therefore add the tuple (C,D) to the list.

Upvotes: 3

Answers (3)

jpp

Reputation: 164623

Here is a solution to your refined question:

from collections import defaultdict
from itertools import combinations

data = {'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
        'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]}

d = defaultdict(set)

for i, j in zip(data['Individual'], data['Company']):
    d[i].add(j)

res = {k: sorted(map(sorted, combinations(v, 2))) for k, v in d.items()}

# {1: [['A', 'B'], ['A', 'C'], ['B', 'C']],
#  2: [],
#  3: [],
#  4: [],
#  5: [],
#  6: [],
#  7: [],
#  8: [],
#  9: [],
#  10: [['C', 'D']],
#  11: [],
#  12: [],
#  13: []}

Upvotes: 1

Mohamed Thasin ah

Reputation: 11192

Try this,

temp=df[df.duplicated(subset=['Individual'],keep=False)]
print temp.groupby(['Individual'])['Company'].unique()

>>>1    [A, B]
>>>3    [A, C]

Upvotes: 1

jpp

Reputation: 164623

One solution is to use pandas:

import pandas as pd

d = {'Company':['A','A','A','B','B','B','C','C','C'],'Individual': [1,2,3,1,4,5,3,6,7]}

df = pd.DataFrame(d).groupby('Individual')['Company'].apply(list).reset_index()
companies = df.loc[df['Company'].map(len)>1, 'Company'].tolist()

# [['A', 'B'], ['A', 'C']]

This isn't the most efficient way, but it may be intuitive.

Upvotes: 4

Create list with all unique possible combination based on condition in dataframe in Python

Answers (3)

Related Questions