Reputation: 45
I have the following dataset:
d = {
'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]
}
Now, I need to create a list in Python of all pairs of elements of 'Company', that correspond to the values in 'Individual'.
E.g. The output for above should be as follows for the dataset above: ((A,B),(A,C),(B,C),(C,D)).The first three tuples, since Individual 1 is affiliated with A,B and C and the last one since, Individual 10 is affiliated with C and D.
Further Explanation - If individual =1, the above dataset has 'A','B' and 'C' values. Now, I want to create all unique combination of these three values (tuple), therefore it should create a list with the tuples (A,B),(A,C) and (B,C). The next is Individual=2. Here is only has the value 'A' therefore there is no tuple to append to the list. For next individuals there's only one corresponding company each, hence no further pairs. The only other tuple that has to be added is for Individual=10, since it has values 'C' and 'D' - and should therefore add the tuple (C,D) to the list.
Upvotes: 3
Views: 991
Reputation: 164623
Here is a solution to your refined question:
from collections import defaultdict
from itertools import combinations
data = {'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]}
d = defaultdict(set)
for i, j in zip(data['Individual'], data['Company']):
d[i].add(j)
res = {k: sorted(map(sorted, combinations(v, 2))) for k, v in d.items()}
# {1: [['A', 'B'], ['A', 'C'], ['B', 'C']],
# 2: [],
# 3: [],
# 4: [],
# 5: [],
# 6: [],
# 7: [],
# 8: [],
# 9: [],
# 10: [['C', 'D']],
# 11: [],
# 12: [],
# 13: []}
Upvotes: 1
Reputation: 11192
Try this,
temp=df[df.duplicated(subset=['Individual'],keep=False)]
print temp.groupby(['Individual'])['Company'].unique()
>>>1 [A, B]
>>>3 [A, C]
Upvotes: 1
Reputation: 164623
One solution is to use pandas
:
import pandas as pd
d = {'Company':['A','A','A','B','B','B','C','C','C'],'Individual': [1,2,3,1,4,5,3,6,7]}
df = pd.DataFrame(d).groupby('Individual')['Company'].apply(list).reset_index()
companies = df.loc[df['Company'].map(len)>1, 'Company'].tolist()
# [['A', 'B'], ['A', 'C']]
This isn't the most efficient way, but it may be intuitive.
Upvotes: 4