ITN00bs
ITN00bs

Reputation: 55

matplotlib bar graph from a pandas series of value counts

I am trying to plot a graph in python which would show me the occurrence of items over time. So I want to find out how many items that match two categories appear in each year and plot a graph based on these.

This is my data in excel:

enter image description here

What I want to end up with is a list of movies which are both fantasy and action and how many times they appear in every year. This is the end result which I have gotten to (which is correct)

enter image description here

i.e. there are 2 movies in 2004 which are both fantasy and action and there is one movie in 2005 which is both fantasy and action etc.

Here are the steps which I have taken to get to the result:

#import data:
data = pd.read_csv("data.csv")

#put all fantasy movies in a list:
fantasy_movies = data[['Name', 'Genre']][(data['Genre'] == 'Fantasy')]
fantasy_movies.rename(columns={'Genre' : 'Fantasy'}, inplace = True)

#put all action movies in a list:
action_movies = data[['Name', 'Genre']][(data['Genre'] == 'Action')]
action_movies.rename(columns={'Genre' : 'Action'}, inplace = True)

#merge the two datasets:
action_fantasy = pd.merge(fantasy_movies, action_movies)

#obtain a list of unique movie names:
unique = action_fantasy.Name.unique()

#make dates the column and unique names the rows
filter_data = data[(data.Name.isin(unique))] 
table = filter_data.pivot_table(filter_data, index = ['Name'],columns=['year'])

#replace all NaNs with zero
table1 = table.fillna(0)

#Count items in years
table1.gt(0).astype(int).sum(axis=0)

Now, from here I would like to do some kind of graph (I'm thinking of a bar graph) using Matplotlib that would have years on the bottom and going up by the amount as per the table1 result. I am struggling to create one, even though it should technically be as easy as putting data on the x column and data on the y column.

Like the code from W3 Schools: https://www.w3schools.com/python/matplotlib_bars.asp

x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])

plt.bar(x,y)
plt.show()

I wonder if my data is in the wrong format? What would be my x-axis and y-axis?

Upvotes: 1

Views: 6457

Answers (1)

tdy
tdy

Reputation: 41327

I wonder if my data is in the wrong format?

Not "wrong" per se, but it has a MultiIndex which is a bit of a hassle and unnecessary here. I suggest getting rid of the MultiIndex with Series.droplevel before plotting via pandas/matplotlib/seaborn.


pandas bar plot

After dropping the MultiIndex, use Series.plot.bar which plots the values as y against the index as x:

counts = table1.gt(0).astype(int).sum(axis=0).droplevel(0)
# year
# 2004    2
# 2005    1
# 2011    1
# 2016    1
# 2018    2
# dtype: int64

counts.plot.bar(ylabel='total')


matplotlib bar plot

If you really want to use plt.bar, I suggest resetting the Series into a DataFrame and then plotting the total against the range index:

counts = table1.gt(0).astype(int).sum(axis=0).droplevel(0).reset_index(name='total')
#    year  total
# 0  2004      2
# 1  2005      1
# 2  2011      1
# 3  2016      1
# 4  2018      2

plt.bar(counts.index, counts.total)
plt.xticks(ticks=counts.index, labels=counts.year)
plt.xlabel('year')
plt.ylabel('total')


seaborn bar plot

Alternatively pass the DataFrame into sns.barplot:

import seaborn as sns
sns.barplot(data=counts, x='year', y='total')

Upvotes: 1

Related Questions