JackX
JackX

Reputation: 49

Plotting multiple lines in one graph with time in months on the x-axis, number of occurences on the y-axis

I'm currently working on a movie dataset, which I have filtered to the number of watches per day per genre. I have filtered it to a dataframe as follows:

I created a dataframe with 2 columns (besides the index), namely 'Date' and 'Genre'. The datatypes are datetime64[ns] and 'Genre' is an object.

To visualize this:

Date           Genre
2018-01-01     romance
2018-01-01     fiction
2018-01-01     romance
2018-01-02     drama
2018-01-02     romance
2018-01-02     fiction    
2018-01-02     romance
2018-01-03     romance
2018-01-03     drama

The list goes on (whole 2018) and it shows that, based on the dataset, on 2018-01-01 three movies have been watched in the Genre romance, fiction, and romance.

Question:

I want to plot a multiple line graph, in which each line represents a different genre. On the x-axis, the time will be displayed in months and on the y-axis, the number of watches will be displayed. What I'm trying to do, is plot each genre in the same graph and show the number of watches of that genre per day, where the x-axis is labeled in months.

What I've tried so far:

Sorting the movie dataframe per genre and store it in a new variable:

df_2018_rom = df_movies_2018[df_movies_2018.Genre == 'romance']
.groupby(['Genre', 'Date']).Date.count()

But I still can't seem to plot the graph I want.

Thanks for any help in advance!

Upvotes: 2

Views: 644

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

You could do this simply, by reshaping your DataFrame with pandas.crosstab:

Example

# if needed - make sure 'Date' is correct dtype
df_movies_2018['Date'] = pd.to_datetime(df['Date'])

# Filter to genres you're interested in
genres_to_plot = ['romance', 'drama', 'fiction']
df = df_movies_2018[df_movies_2018.Genre.isin(genres_to_plot)]

df_cross = pd.crosstab(df.Date, df.Genre)
df_cross.plot()

enter image description here

For reference, df_cross looks like:

Genre       drama  fiction  romance
Date                               
2018-01-01      0        1        2
2018-01-02      1        1        2
2018-01-03      1        0        1

Pandas DataFrame.plot method will treat each column in a DataFrame as an individual series (line) with the index being the default x-axis values.

Upvotes: 2

Related Questions