Andrew Louis
Andrew Louis

Reputation: 303

pandas dataframe group by particular values

Suppose I have a pandas dataframe of transactions that looks like this:

+----------+----------+----------+---------+
|  Owner   |  Seller  | Mediator |  Buyer  |
+----------+----------+----------+---------+
| 'andrew' | 'bob'    | 'tom'    | 'john'  |
| 'andrew' | 'andrew' | 'bill'   | 'jason' |
| 'andrew' | 'bill'   |  'bill'  | 'tom'   |
+----------+----------+----------+---------+

I want to perform a weird groupby- I want to group by people's names based on any involvement in the transactions. So the output would be:

+----------+-------+
|   Name   | Count |
+----------+-------+
| 'andrew' |     3 |
| 'bob'    |     1 |
| 'tom'    |     2 |
| 'john'   |     1 |
| 'bill'   |     2 |
| 'jason'  |     1 |
+----------+-------+

I.e., 'andrew' has a count of 3 because his name appears in 3 transactions, 'john' has a count of 1 because he only appears in 1, etc.

Any tips to go about doing this? Thanks in advance

Upvotes: 2

Views: 335

Answers (3)

jxc
jxc

Reputation: 13998

You can use unstack() to:

  1. Put all Names into one column
  2. groupby Name and count unique original-index which is level_1 after unstack() and reset_index():
    (df.unstack()
       .reset_index(name='Name')
       .groupby('Name') 
       .level_1 
       .nunique() 
       .rename('Count') 
       .reset_index())

    #Out[xx]:
    #     Name  Count
    #0  andrew      3
    #1    bill      2
    #2     bob      1
    #3   jason      1
    #4    john      1
    #5     tom      2

Upvotes: 2

kantal
kantal

Reputation: 2407

A solution with 'unique()':

df.apply(lambda row: row.unique(),axis=1) \
  .explode().value_counts() \
  .to_frame(name="Count")  \
  .rename_axis(["Name"])      

        Count
Name         
andrew      3
bill        2
tom         2
john        1
bob         1
jason       1

Upvotes: 1

James
James

Reputation: 36598

You can create a set from each row, then reshape to a vertical data stack and get the value counts.

import pandas as pd

df = pd.DataFrame({'Owner': ['andrew', 'andrew', 'andrew'],
 'Seller': ['bob', 'andrew', 'bill'],
 'Mediator': ['tom', 'bill', 'bill'],
 'Buyer': ['john', 'jason', 'tom']}
)

cnt = (
    df.apply(lambda r: pd.Series(list(set(r))), axis=1)
      .stack()
      .value_counts()
      .reset_index().rename(columns={'index': 'Name', 0: 'Count'})
)
cnt
# returns:
     Name  Count
0  andrew      3
1    bill      2
2     tom      2
3   jason      1
4    john      1
5     bob      1

Upvotes: 1

Related Questions