Reputation: 339
I'm trying to plot a scatter plot of yearly values with a line showing the mean of the value of x each year.
The plot should be something like
(I drew that line onto the graph...and the xticks at the bottom should be by "season" ascending.)
I getting stuck with the second plot: I get a "tuple index out of range" error when it gets to the line
ax2.plot(x2, y2, color='r')
I'm not sure if I'm even approaching this correctly, but I have my main dataframe which has all of my values, then I created a groupby series for the mean values of each season/year combination. Then I couldn't get that to plot, so I converted it to a dataframe and reindexed it hoping that would help. It didn't. Not sure where to go from here.
The problems started when I created the Pandas Categorical object, but that was the only way I could think of to get my data sorted correctly. Maybe that's the problem, but I'm not sure how else to get it sorted and to get the labels done right.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
file = r"C:\myfile.xlsx"
df = pd.read_excel(file)
season = ["Spring 2008", "Summer 2008", "Fall 2008",
"Spring 2009", "Summer 2009", "Fall 2009",
"Spring 2010", "Summer 2010", "Fall 2010",
"Spring 2011", "Summer 2011", "Fall 2011",
"Spring 2012", "Summer 2012", "Fall 2012",
"Spring 2013", "Summer 2013", "Fall 2013",
"Spring 2014", "Summer 2014", "Fall 2014",
"Spring 2015", "Summer 2015", "Fall 2015",
"Spring 2016", "Summer 2016", "Fall 2016",
"Spring 2017", "Summer 2017", "Fall 2017",
"Spring 2018", "Summer 2018", "Fall 2018",
"Spring 2019"]
df = df.loc[df['Total'] > 100]
df['Season_Year'] = df.apply(lambda row: row.Semester + " " + str(row.Year), axis=1)
df['Season_Year'] = pd.Categorical(df['Season_Year'], season)
df.sort_values(by='Season_Year', inplace=True, ascending=True)
df = df.dropna()
df['Score'] = df.apply(lambda row: row.Respondents / row.Total, axis=1)
grouped = df.groupby('Season_Year')['Score'].mean()
grouped = grouped.dropna()
df2 = grouped.to_frame()
df2 = df2.reset_index()
df2.head()
x = df['Season_Year']
y = df['Score']
x2 = df2['Season_Year']
y2 = df2['Score']
fig, ax = plt.subplots()
ax.scatter(x, y, marker='o', color='black')
ax2 = ax.twinx()
ax2.plot(x2, y2, color='r')
ax.set_ylim(0, 1.1)
ax2.set_ylim(0, 1.1)
ax.set_xticklabels(season, rotation='vertical')
plt.show()
Upvotes: 0
Views: 1223
Reputation: 2515
You can graph them (almost) directly, in one line, like this:
ax2 = ax.twinx()
ax2.plot( list(x2.values), list(y2.values), color='r')
Or, you can extract the values explicitly, into lists, like this:
x2 = [ x2[n] for n in range( x2.shape[0]) ]
y2 = [ y2[n] for n in range( y2.shape[0]) ]
And then graph them as you do in your example,
ax2 = ax.twinx()
ax2.plot(x2, y2, color='r')
Upvotes: 1