Reputation: 303
Suppose I have a pandas dataframe of transactions that looks like this:
+----------+----------+----------+---------+
| Owner | Seller | Mediator | Buyer |
+----------+----------+----------+---------+
| 'andrew' | 'bob' | 'tom' | 'john' |
| 'andrew' | 'andrew' | 'bill' | 'jason' |
| 'andrew' | 'bill' | 'bill' | 'tom' |
+----------+----------+----------+---------+
I want to perform a weird groupby- I want to group by people's names based on any involvement in the transactions. So the output would be:
+----------+-------+
| Name | Count |
+----------+-------+
| 'andrew' | 3 |
| 'bob' | 1 |
| 'tom' | 2 |
| 'john' | 1 |
| 'bill' | 2 |
| 'jason' | 1 |
+----------+-------+
I.e., 'andrew' has a count of 3 because his name appears in 3 transactions, 'john' has a count of 1 because he only appears in 1, etc.
Any tips to go about doing this? Thanks in advance
Upvotes: 2
Views: 335
Reputation: 13998
You can use unstack() to:
Name
and count unique original-index
which is level_1
after unstack()
and reset_index()
: (df.unstack()
.reset_index(name='Name')
.groupby('Name')
.level_1
.nunique()
.rename('Count')
.reset_index())
#Out[xx]:
# Name Count
#0 andrew 3
#1 bill 2
#2 bob 1
#3 jason 1
#4 john 1
#5 tom 2
Upvotes: 2
Reputation: 2407
A solution with 'unique()':
df.apply(lambda row: row.unique(),axis=1) \
.explode().value_counts() \
.to_frame(name="Count") \
.rename_axis(["Name"])
Count
Name
andrew 3
bill 2
tom 2
john 1
bob 1
jason 1
Upvotes: 1
Reputation: 36598
You can create a set from each row, then reshape to a vertical data stack and get the value counts.
import pandas as pd
df = pd.DataFrame({'Owner': ['andrew', 'andrew', 'andrew'],
'Seller': ['bob', 'andrew', 'bill'],
'Mediator': ['tom', 'bill', 'bill'],
'Buyer': ['john', 'jason', 'tom']}
)
cnt = (
df.apply(lambda r: pd.Series(list(set(r))), axis=1)
.stack()
.value_counts()
.reset_index().rename(columns={'index': 'Name', 0: 'Count'})
)
cnt
# returns:
Name Count
0 andrew 3
1 bill 2
2 tom 2
3 jason 1
4 john 1
5 bob 1
Upvotes: 1