Reputation:
This is actually a follow up question to my previous question.
pandas: How to plot the pie diagram for the movie counts versus genre of IMDB movies in pandas?
In that question, we plotted the number of uniques genres for the movies.
My question is: How to get the 'budget'
versus 'genres'
plot in pandas
?
Here is the sample code:
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame({'movie' : ['A', 'B','C','D'],
'budget': [1000, 2000, 3000, 4000],
'genres': ['Science Fiction|Romance|Family', 'Action|Romance',
'Family|Drama','Mystery|Science Fiction|Drama']},
index=range(4))
df
Here genre Science Fiction|Romance|Family
is actually three separate genres.
The Science Fiction
appears in moives A
and B
so the budget for the genre Science Fiction
should be 1000+4000=5000
and so on.
Upvotes: 0
Views: 1181
Reputation: 1509
Here's how you can barplot total budget for each genre:
genres = (df.genres.str.split('|', expand=True)
.stack()
.to_frame(name='genre'))
genres.index = genres.index.droplevel(1)
So genres
becomes:
genre
0 Science Fiction
0 Romance
0 Family
1 Action
1 Romance
2 Family
2 Drama
3 Mystery
3 Science Fiction
3 Drama
Now perform a join and groupby to first get budget info, then sum on genre:
(genres.join(df['budget'])
.groupby('genre')
.sum()
.plot(kind='bar'))
Output:
Upvotes: 2