Reputation: 49
I'm currently working on a movie dataset, which I have filtered to the number of watches per day per genre. I have filtered it to a dataframe as follows:
I created a dataframe with 2 columns (besides the index), namely 'Date'
and 'Genre'
. The datatypes are datetime64[ns]
and 'Genre'
is an object
.
To visualize this:
Date Genre
2018-01-01 romance
2018-01-01 fiction
2018-01-01 romance
2018-01-02 drama
2018-01-02 romance
2018-01-02 fiction
2018-01-02 romance
2018-01-03 romance
2018-01-03 drama
The list goes on (whole 2018) and it shows that, based on the dataset, on 2018-01-01 three movies have been watched in the Genre
romance, fiction, and romance.
Question:
I want to plot a multiple line graph, in which each line represents a different genre. On the x-axis, the time will be displayed in months and on the y-axis, the number of watches will be displayed. What I'm trying to do, is plot each genre in the same graph and show the number of watches of that genre per day, where the x-axis is labeled in months.
What I've tried so far:
Sorting the movie dataframe per genre and store it in a new variable:
df_2018_rom = df_movies_2018[df_movies_2018.Genre == 'romance']
.groupby(['Genre', 'Date']).Date.count()
But I still can't seem to plot the graph I want.
Thanks for any help in advance!
Upvotes: 2
Views: 644
Reputation: 18647
You could do this simply, by reshaping your DataFrame
with pandas.crosstab
:
# if needed - make sure 'Date' is correct dtype
df_movies_2018['Date'] = pd.to_datetime(df['Date'])
# Filter to genres you're interested in
genres_to_plot = ['romance', 'drama', 'fiction']
df = df_movies_2018[df_movies_2018.Genre.isin(genres_to_plot)]
df_cross = pd.crosstab(df.Date, df.Genre)
df_cross.plot()
For reference, df_cross
looks like:
Genre drama fiction romance
Date
2018-01-01 0 1 2
2018-01-02 1 1 2
2018-01-03 1 0 1
Pandas DataFrame.plot
method will treat each column in a DataFrame
as an individual series (line) with the index
being the default x-axis values.
Upvotes: 2