Reputation: 449
I want to evaluate 'percent of number of releases in a year
' as a parameter of popularity of a genre in the movieLens dataset.
Sample data is shown below:
I can set the index to be the year as
df1 = df.set_index('year')
then, I can find the total per row and then divide the individual cells to get a sense of percentages as:
df1= df.set_index('year')
df1['total'] = df1.iloc[:,1:4].sum(axis=1)
df2 = df1.drop('movie',axis=1)
df2 = df2.div(df2['total'], axis= 0) * 100
df2.head()
Now,what's the best way to get % of number of releases in a year? I believe use 'groupby' and then heatmap?
Upvotes: 2
Views: 724
Reputation: 3594
You can clearly use groupby
method:
import pandas as pd
import numpy as np
df = pd.DataFrame({'movie':['Movie1','Movie2','Movie3'], 'action':[1,0,0], 'com':[np.nan,np.nan,1], 'drama':[1,1,np.nan], 'year
':[1994,1994,1995]})
df.fillna(0,inplace=True)
df.set_index('year')
print((df.groupby(['year']).sum()/len(df))*100)
Output:
action com drama
year
1994 33.333333 0.000000 66.666667
1995 0.000000 33.333333 0.000000
Also, you can use pandas
built-in style
for the colored representation of the dataframe (or just use seaborn
):
df = df.groupby(['year']).sum()/len(df)*100
df.style.background_gradient(cmap='viridis')
Upvotes: 2