user3379546
user3379546

Reputation: 87

group by columns values combination

I have a dataframe like this

+---+---+---
|  A|  B| 
+---+---+
|  1|  3|
|  1|  1|
|  1|  2|
|  1|  5| 
|  2|  5| 
|  2|  2| 
|  2|  1| 
|  3|  1| 
|  3|  2|
|  3|  5|
|  4|  3| 
|  4|  4|
|  5|  4|
|  5|  3|
|  6|  2|
|  6|  5|
|  6|  1|
|  6|  3|
|  7|  5|
|  7|  4|
|  7|  3|

+---+---+

i want to have the count of the occurrence of two values of B together for each A to get the most common combination of two values of B, (the order doesn't matter) i want the result to be :[1,2],[1,5],[1,2] and [3,4] as the values of B appearing the most together( i mean for the same A ) I have tried this :

oc=pd.DataFrame(columns=['A','B_combination'])
oc['B_combination']=df.astype('str').groupby('A')['B'].agg([ ';'.join,lambda x: set(x.tolist())])['<lambda_0>'].values
oc['A']=df.astype('str').groupby('A')['B'].agg([ ';'.join,lambda x: set(x.tolist())])['<lambda_0>'].index 

to get the different combination like this :

|A |B_combination|
---+--------------
|1 |{2, 1, 3, 5} |
|2 |{2, 1, 5}    |
|3 |{2, 1, 5}    |
|4 |{4, 3}       |
|5 |{4, 3}       |
|6 |{2, 1, 3, 5} |
|7 |{4, 3, 5}    |

but when i apply

oc.groupby('B_combination').count()

to get the most commun combination it doesnt work because it is a set i tried to convert to alist but same it didn't work

Upvotes: 3

Views: 73

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150825

Let's try itertools.combinations with groupby():

(df.groupby('A')['B']
   .apply(lambda x: pd.Series([tuple(sorted(x)) for x in combinations(x,2)]).value_counts())
   .reset_index()
)

Output:

    A level_1  B
0   1  (3, 5)  1
1   1  (2, 5)  1
2   1  (3, 1)  1
3   1  (3, 2)  1
4   1  (1, 5)  1
5   1  (1, 2)  1
6   2  (2, 1)  1
7   2  (5, 1)  1
8   2  (5, 2)  1
9   3  (2, 5)  1
10  3  (1, 5)  1
11  3  (1, 2)  1
12  4  (3, 4)  1
13  5  (4, 3)  1
14  6  (5, 3)  1
15  6  (2, 1)  1
16  6  (2, 3)  1
17  6  (1, 3)  1
18  6  (2, 5)  1
19  6  (5, 1)  1
20  7  (5, 3)  1
21  7  (5, 4)  1
22  7  (4, 3)  1

Upvotes: 2

roddar92
roddar92

Reputation: 366

df.groupby('A')['B'].apply(set).reset_index(name='B_combination')

Upvotes: 0

Related Questions