Karina Chan
Karina Chan

Reputation: 13

How to plot a histogram of a pandas series of lists

I am trying to plot a histogram that shows the frequency of genre_ids across movie data. The data is currently stored as a list of ids in a pandas dataframe column, since some movies are several genres and looks like this:

genre_ids
[35]                         
[18]                          
[35, 10749]                   
[18, 10749]                   
[35, 18, 10749] 

How do I plot a histogram such that the values on the axis are just the genre ids individually and not the lists themselves? I searched everywhere for this question and couldn't figure it out. So far I'm just using:

movie_data['genre_ids'].hist()

Where movie_data is the data frame. And I want the histogram to look like:

x
x   x  
x   x  x
35 18 10749 

Instead of:

x
x              x
x      x       x      x
[35] [18,35] [18] [18,10749]  

for example

Upvotes: 1

Views: 634

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102329

You can try

from itertools import chain
pd.Series(list(chain(*df['genre_ids']))).sort_values().value_counts().plot.bar()

which shows

enter image description here

Data

import pandas as pd

df = pd.DataFrame({'genre_ids': [[35], [18], [35, 10749], [18, 10749], [35, 18, 10749]]})

Upvotes: 0

roshambo
roshambo

Reputation: 347

Since pandas >= 0.25.0, you're able to use the explode method.

movie_data['genre_ids'].explode().hist()

will do the trick.

Upvotes: 1

Suraj Motaparthy
Suraj Motaparthy

Reputation: 520

Before doing the histogram, you need to bring out the elements from the lists.

This should do the job:

form Pandas import Series
movie_data['genre_ids'].apply(Series).stack().hist()

Upvotes: 1

Related Questions