Reputation: 13
I am trying to plot a histogram that shows the frequency of genre_ids across movie data. The data is currently stored as a list of ids in a pandas dataframe column, since some movies are several genres and looks like this:
genre_ids
[35]
[18]
[35, 10749]
[18, 10749]
[35, 18, 10749]
How do I plot a histogram such that the values on the axis are just the genre ids individually and not the lists themselves? I searched everywhere for this question and couldn't figure it out. So far I'm just using:
movie_data['genre_ids'].hist()
Where movie_data is the data frame. And I want the histogram to look like:
x
x x
x x x
35 18 10749
Instead of:
x
x x
x x x x
[35] [18,35] [18] [18,10749]
for example
Upvotes: 1
Views: 634
Reputation: 102329
You can try
from itertools import chain
pd.Series(list(chain(*df['genre_ids']))).sort_values().value_counts().plot.bar()
which shows
import pandas as pd
df = pd.DataFrame({'genre_ids': [[35], [18], [35, 10749], [18, 10749], [35, 18, 10749]]})
Upvotes: 0
Reputation: 347
Since pandas >= 0.25.0
, you're able to use the explode
method.
movie_data['genre_ids'].explode().hist()
will do the trick.
Upvotes: 1
Reputation: 520
Before doing the histogram, you need to bring out the elements from the lists.
This should do the job:
form Pandas import Series
movie_data['genre_ids'].apply(Series).stack().hist()
Upvotes: 1