Reputation: 1
I am trying to create a stacked area chart, showing the evolution of courses and their numbers over time. So my data frame is (index=Year):
Area Courses
Year
1900 Agriculture 0.0
1900 Architecture 32.0
1900 Astronomy 10.0
1900 Biology 20.0
1900 Chemistry 25.0
1900 Civil Engineering 21.0
1900 Education 14.0
1900 Engineering Design 10.0
1900 English 30.0
1900 Geography 1.0
Last year: 2011.
I tried several solutions, such as df.plot.area(), df.plot.area(x='Years'). Then I thought it would help to have the Areas as columns so I tried
df.pivot_table(index = 'Year', columns = 'Area', values = 'Courses', aggfunc = 'sum')
but instead of getting sum of courses per year, I got:
Area Aeronautical Engineering ... Visual Design
Year ...
1900 NaN ... NaN
1901 NaN ... NaN
Thanks for your help. It's my first post. Sorry if I missed something.
Update. Here is my code:
df = pd.read_csv(filepath, encoding= 'unicode_escape')
df = df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name = 'Courses').reset_index()
plt.stackplot(df['Year'], df['Courses'], labels = df['GenArea'])
plt.legend(loc='upper left')
plt.show()
And here is the link for the dataset: https://data.world/makeovermonday/2020w12
Upvotes: 0
Views: 246
Reputation:
With the extra given information I made this. Hope you like it!
import pandas as pd
import matplotlib.pyplot as plt
plt.close('all')
df=pd.read_csv('https://query.data.world/s/djx5mi7dociacx7smdk45pfmwp3vjo',
encoding='unicode_escape')
df=df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name=
'Courses').reset_index()
aux1=df.duplicated(subset='GenArea', keep='first').values
aux2=df.duplicated(subset='Year', keep='first').values
n=len(aux1);year=[];courses=[]
for i in range(n):
if not aux1[i]:
courses.append(df.iloc[i]['GenArea'])
if not aux2[i]:
year.append(df.iloc[i]['Year'])
else:
continue
del aux1,aux2
df1=pd.DataFrame(index=year)
s=0
for i in range(len(courses)):
df1[courses[i]]=0
for i in range(n):
string=df.iloc[i]['GenArea']
if any(df1.iloc[s].values==0):
df1.at[year[s],string]=df.iloc[i]['Courses']
else:
s+=1
df1.at[year[s],string]=df.iloc[i]['Courses']
del year,courses,df
df1=df1[df1.columns[::-1]]
df1.plot.area(legend='reverse')
Upvotes: 1