Reputation: 87
I have a dataframe like this
+---+---+---
| A| B|
+---+---+
| 1| 3|
| 1| 1|
| 1| 2|
| 1| 5|
| 2| 5|
| 2| 2|
| 2| 1|
| 3| 1|
| 3| 2|
| 3| 5|
| 4| 3|
| 4| 4|
| 5| 4|
| 5| 3|
| 6| 2|
| 6| 5|
| 6| 1|
| 6| 3|
| 7| 5|
| 7| 4|
| 7| 3|
+---+---+
i want to have the count of the occurrence of two values of B together for each A to get the most common combination of two values of B, (the order doesn't matter)
i want the result to be :[1,2],[1,5],[1,2] and [3,4]
as the values of B appearing the most together( i mean for the same A )
I have tried this :
oc=pd.DataFrame(columns=['A','B_combination'])
oc['B_combination']=df.astype('str').groupby('A')['B'].agg([ ';'.join,lambda x: set(x.tolist())])['<lambda_0>'].values
oc['A']=df.astype('str').groupby('A')['B'].agg([ ';'.join,lambda x: set(x.tolist())])['<lambda_0>'].index
to get the different combination like this :
|A |B_combination|
---+--------------
|1 |{2, 1, 3, 5} |
|2 |{2, 1, 5} |
|3 |{2, 1, 5} |
|4 |{4, 3} |
|5 |{4, 3} |
|6 |{2, 1, 3, 5} |
|7 |{4, 3, 5} |
but when i apply
oc.groupby('B_combination').count()
to get the most commun combination it doesnt work because it is a set i tried to convert to alist but same it didn't work
Upvotes: 3
Views: 73
Reputation: 150825
Let's try itertools.combinations
with groupby()
:
(df.groupby('A')['B']
.apply(lambda x: pd.Series([tuple(sorted(x)) for x in combinations(x,2)]).value_counts())
.reset_index()
)
Output:
A level_1 B
0 1 (3, 5) 1
1 1 (2, 5) 1
2 1 (3, 1) 1
3 1 (3, 2) 1
4 1 (1, 5) 1
5 1 (1, 2) 1
6 2 (2, 1) 1
7 2 (5, 1) 1
8 2 (5, 2) 1
9 3 (2, 5) 1
10 3 (1, 5) 1
11 3 (1, 2) 1
12 4 (3, 4) 1
13 5 (4, 3) 1
14 6 (5, 3) 1
15 6 (2, 1) 1
16 6 (2, 3) 1
17 6 (1, 3) 1
18 6 (2, 5) 1
19 6 (5, 1) 1
20 7 (5, 3) 1
21 7 (5, 4) 1
22 7 (4, 3) 1
Upvotes: 2