Reputation: 29
I'll try to ask my question as clearly as possible.
I have the following DataFrame which looks like this
import pandas as pd
data = {'player' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'game' : ['Soccer', 'Basketball', 'Ping pong', 'Soccer', 'Tennis', 'Tennis', 'Baseball', 'Volleyball', 'Dodgeball']}
df = pd.DataFrame(data, columns=['player','game'])
player game
0 A Soccer
1 A Basketball
2 A Ping pong
3 B Soccer
4 B Tennis
5 B Tennis
6 C Baseball
7 C Volleyball
8 C Dodgeball
Now I want to keep values unique to each player only once. Ideally in a list, but that's not a big deal.
For example, player A
and B
play soccer
so I don't want soccer in the output.
tennis
appears twice, but both for player B
so it would be in the output.
I'd want to output to be :
player game
0 A Basketball
1 A Ping pong
2 B Soccer
3 B Tennis
4 C Baseball
5 C Volleyball
6 C Dodgeball
Or like this:
player game
0 A [Basketball, Ping Pong]
1 B [Soccer, Tennis]
2 C [Baseball, Volleyball, Dodgeball]
Thank you for your help!
Upvotes: 1
Views: 63
Reputation: 862511
It seems need remove duplicates with keep last per column 'game' by DataFrame.drop_duplicates
and then if need lists aggregate them by list
:
df = (df.drop_duplicates('game', keep='last')
.groupby('player')['game']
.agg(list)
.reset_index())
print (df)
player game
0 A [Basketball, Ping pong]
1 B [Soccer, Tennis]
2 C [Baseball, Volleyball, Dodgeball]
Upvotes: 2