user8864088
user8864088

Reputation:

Pandas: How to plot the imdb movie total budget versus the separate genre in pandas?

This is actually a follow up question to my previous question.

pandas: How to plot the pie diagram for the movie counts versus genre of IMDB movies in pandas?

In that question, we plotted the number of uniques genres for the movies. My question is: How to get the 'budget' versus 'genres' plot in pandas?

Here is the sample code:

import pandas as pd
import numpy as np 
%matplotlib inline

df = pd.DataFrame({'movie' : ['A', 'B','C','D'],
                   'budget': [1000, 2000, 3000, 4000],
                   'genres': ['Science Fiction|Romance|Family', 'Action|Romance',
                              'Family|Drama','Mystery|Science Fiction|Drama']},
                  index=range(4))
df

Here genre Science Fiction|Romance|Family is actually three separate genres.

The Science Fiction appears in moives A and B so the budget for the genre Science Fiction should be 1000+4000=5000 and so on.

Upvotes: 0

Views: 1181

Answers (1)

Andrey Portnoy
Andrey Portnoy

Reputation: 1509

Here's how you can barplot total budget for each genre:

genres = (df.genres.str.split('|', expand=True)
            .stack()
            .to_frame(name='genre'))


genres.index = genres.index.droplevel(1)

So genres becomes:

        genre
0   Science Fiction
0   Romance
0   Family
1   Action
1   Romance
2   Family
2   Drama
3   Mystery
3   Science Fiction
3   Drama

Now perform a join and groupby to first get budget info, then sum on genre:

(genres.join(df['budget'])
       .groupby('genre')
       .sum()
       .plot(kind='bar'))

Output:

enter image description here

Upvotes: 2

Related Questions