CodeMaster
CodeMaster

Reputation: 449

Python - Get percentage based on column values

I want to evaluate 'percent of number of releases in a year' as a parameter of popularity of a genre in the movieLens dataset. Sample data is shown below:

enter image description here

I can set the index to be the year as

   df1 = df.set_index('year')

then, I can find the total per row and then divide the individual cells to get a sense of percentages as:

df1= df.set_index('year')
df1['total'] = df1.iloc[:,1:4].sum(axis=1)
df2 = df1.drop('movie',axis=1)
df2 = df2.div(df2['total'], axis= 0) * 100
df2.head()

enter image description here

Now,what's the best way to get % of number of releases in a year? I believe use 'groupby' and then heatmap?

Upvotes: 2

Views: 724

Answers (1)

Grayrigel
Grayrigel

Reputation: 3594

You can clearly use groupby method:

import pandas as pd
import numpy as np

df = pd.DataFrame({'movie':['Movie1','Movie2','Movie3'],  'action':[1,0,0], 'com':[np.nan,np.nan,1], 'drama':[1,1,np.nan], 'year
':[1994,1994,1995]})

df.fillna(0,inplace=True)
df.set_index('year')
print((df.groupby(['year']).sum()/len(df))*100)

Output:

         action        com      drama
year                                 
1994  33.333333   0.000000  66.666667
1995   0.000000  33.333333   0.000000

Also, you can use pandas built-in style for the colored representation of the dataframe (or just use seaborn):

df = df.groupby(['year']).sum()/len(df)*100
df.style.background_gradient(cmap='viridis')

Output: enter image description here

Upvotes: 2

Related Questions