Satya
Satya

Reputation: 5907

finding dataframe group count

i have a dataframe like

       customer         genre
0      cust1           |BIOPIC|DRAMA|
1      cust2           |COMEDY|DRAMA|ROMANCE|
2      cust1           |DRAMA|THRILLER|
3      cust3           |COMEDY|HORROR|
4      cust4           |HISTORY|ROMANCE|WAR|
5      cust3           |ADVENTURE|COMEDY|
6      cust2           |ACTION|DRAMA|THRILLER|
7      cust1           |CRIME|DRAMA|THRILLER|
8      cust3           |HISTORY|ROMANCE|WAR|
9      cust2           |ADVENTURE|COMEDY|
10     cust4           |BIOPIC|DRAMA|HISTORY|THRILLER|

I need = how many times each customer did transaction(customer count) and their respective genre count.Eg. cust1 DRAMA = 3, cust1 THRILLER = 2,like wise for each customer's.

I did found the each customer count by

df = df.groupby(['cust']).size()

then i know how to filter out the genres and getting the count if it was within a LIST , but i am getting confused with how to proceed with each group of customer and getting the count for each customer's indivisual genre count.

filtering(|) from genre expression and getting the fields out.

please suggest.

Upvotes: 0

Views: 74

Answers (1)

Chris McDonald
Chris McDonald

Reputation: 186

The feature str.get_dummies is perfect for this sort of thing! It works just like the dataframe version but on strings and allows you to specify a delimiter. Assuming your dataframe is named df, then the below code does what you're after:

import pandas as pd
import numpy as np   
df = pd.concat([df, df.Genres.str.get_dummies(sep='|')], axis=1)
df = df.groupby("Customers").aggregate(np.sum)

print(df)

output:

           ACTION  ADVENTURE  BIOPIC  COMEDY  CRIME  DRAMA  HISTORY  HORROR  \
Customers
cust1           0          0       1       0      1      3        0       0
cust2           1          1       0       2      0      2        0       0
cust3           0          1       0       2      0      0        1       1
cust4           0          0       1       0      0      1        2       0

To explain a bit, str.get_dummies method makes a new column for every value it sees in the column specified and then marks a 1 for the values present and a 0 elsewhere. The GroupBy and Aggregate methods make clusters according to the customers and add up the columns. Aggregate will silently drop columns which it can't add, in this case the original Genres column.

Upvotes: 1

Related Questions