Alex Monras
Alex Monras

Reputation: 195

count distinct occurrences in pandas

I have a pandas DataFrame with two columns, Name and Car, of every car that is owned in a city,

  Name    Car
0 Alice   Toyota
1 Bob     Nissan
2 Charlie Toyota
3 Dave    Ford
4 Eve     Nissan
5 Bob     Ford

and I want to make a summary table

  Name    Toyota   Nissan   Ford
0 Alice   1        0        1
1 Bob     0        1        0
2 Charlie 1        0        0
3 Dave    0        0        1
4 Eve     0        1        0

I've been trying groupby, count, apply, transform, but I'm just too new to the game...

Actually, the brands are numbered, and it would be ideal to have a way to address them as a Series, e.g., get whole rows as Series. Any help is appreciated.

Upvotes: 1

Views: 108

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

use pivot_table() function for that:

In [30]: df.pivot_table(index=['Name'], columns=['Car'], aggfunc=len, fill_value=0)
Out[30]:
Car      Ford  Nissan  Toyota
Name
Alice       0       0       1
Bob         1       1       0
Charlie     0       0       1
Dave        1       0       0
Eve         0       1       0

or if you don't want to have Name as index:

In [31]: df.pivot_table(index=['Name'], columns=['Car'], aggfunc=len, fill_value=0).reset_index()
Out[31]:
Car     Name  Ford  Nissan  Toyota
0      Alice     0       0       1
1        Bob     1       1       0
2    Charlie     0       0       1
3       Dave     1       0       0
4        Eve     0       1       0

alternatively if you want to have just a boolean matrix use get_dummies() - it won't count duplicates:

In [33]: pd.get_dummies(df.set_index('Name'))
Out[33]:
         Car_Ford  Car_Nissan  Car_Toyota
Name
Alice         0.0         0.0         1.0
Bob           0.0         1.0         0.0
Charlie       0.0         0.0         1.0
Dave          1.0         0.0         0.0
Eve           0.0         1.0         0.0
Bob           1.0         0.0         0.0

Upvotes: 0

Related Questions