count distinct occurrences in pandas

Question

I have a pandas DataFrame with two columns, Name and Car, of every car that is owned in a city,

  Name    Car
0 Alice   Toyota
1 Bob     Nissan
2 Charlie Toyota
3 Dave    Ford
4 Eve     Nissan
5 Bob     Ford

and I want to make a summary table

  Name    Toyota   Nissan   Ford
0 Alice   1        0        1
1 Bob     0        1        0
2 Charlie 1        0        0
3 Dave    0        0        1
4 Eve     0        1        0

I've been trying groupby, count, apply, transform, but I'm just too new to the game...

Actually, the brands are numbered, and it would be ideal to have a way to address them as a Series, e.g., get whole rows as Series. Any help is appreciated.

MaxU - stand with Ukraine · Accepted Answer

use pivot_table() function for that:

In [30]: df.pivot_table(index=['Name'], columns=['Car'], aggfunc=len, fill_value=0)
Out[30]:
Car      Ford  Nissan  Toyota
Name
Alice       0       0       1
Bob         1       1       0
Charlie     0       0       1
Dave        1       0       0
Eve         0       1       0

or if you don't want to have Name as index:

In [31]: df.pivot_table(index=['Name'], columns=['Car'], aggfunc=len, fill_value=0).reset_index()
Out[31]:
Car     Name  Ford  Nissan  Toyota
0      Alice     0       0       1
1        Bob     1       1       0
2    Charlie     0       0       1
3       Dave     1       0       0
4        Eve     0       1       0

alternatively if you want to have just a boolean matrix use get_dummies() - it won't count duplicates:

In [33]: pd.get_dummies(df.set_index('Name'))
Out[33]:
         Car_Ford  Car_Nissan  Car_Toyota
Name
Alice         0.0         0.0         1.0
Bob           0.0         1.0         0.0
Charlie       0.0         0.0         1.0
Dave          1.0         0.0         0.0
Eve           0.0         1.0         0.0
Bob           1.0         0.0         0.0

count distinct occurrences in pandas

Answers (1)

Related Questions