Reputation: 195
I have a pandas DataFrame with two columns, Name and Car, of every car that is owned in a city,
Name Car
0 Alice Toyota
1 Bob Nissan
2 Charlie Toyota
3 Dave Ford
4 Eve Nissan
5 Bob Ford
and I want to make a summary table
Name Toyota Nissan Ford
0 Alice 1 0 1
1 Bob 0 1 0
2 Charlie 1 0 0
3 Dave 0 0 1
4 Eve 0 1 0
I've been trying groupby, count, apply, transform, but I'm just too new to the game...
Actually, the brands are numbered, and it would be ideal to have a way to address them as a Series, e.g., get whole rows as Series. Any help is appreciated.
Upvotes: 1
Views: 108
Reputation: 210982
use pivot_table() function for that:
In [30]: df.pivot_table(index=['Name'], columns=['Car'], aggfunc=len, fill_value=0)
Out[30]:
Car Ford Nissan Toyota
Name
Alice 0 0 1
Bob 1 1 0
Charlie 0 0 1
Dave 1 0 0
Eve 0 1 0
or if you don't want to have Name
as index:
In [31]: df.pivot_table(index=['Name'], columns=['Car'], aggfunc=len, fill_value=0).reset_index()
Out[31]:
Car Name Ford Nissan Toyota
0 Alice 0 0 1
1 Bob 1 1 0
2 Charlie 0 0 1
3 Dave 1 0 0
4 Eve 0 1 0
alternatively if you want to have just a boolean matrix use get_dummies()
- it won't count duplicates:
In [33]: pd.get_dummies(df.set_index('Name'))
Out[33]:
Car_Ford Car_Nissan Car_Toyota
Name
Alice 0.0 0.0 1.0
Bob 0.0 1.0 0.0
Charlie 0.0 0.0 1.0
Dave 1.0 0.0 0.0
Eve 0.0 1.0 0.0
Bob 1.0 0.0 0.0
Upvotes: 0