Reputation: 31
I'm new in using pandas and one of my functions does not behave as expected. I have this dataframe:
title_year gross
0 2009 7.60506e+08
1 2007 3.09404e+08
2 2015 2.00074e+08
3 2012 4.48131e+08
5 2012 7.30587e+07
6 2007 3.3653e+08
7 2010 2.00807e+08
8 2015 4.58992e+08
9 2009 3.01957e+08
The function is:
def analysis_gross_per_year(year1, year2):
year_df = data[['title_year', 'gross']]
check = True
year_df.title_year = year_df.title_year.fillna('Not Given')
year_df.gross = year_df.gross.fillna('Not Given')
year_df = year_df[year_df.gross != 'Not Given']
gross_year = year_df[year_df.title_year.str.contains(year1, na=True)]
number = int(year1)
while check :
if str(number) == year2:
check = False
else:
number = number + 1
df1 = year_df[year_df.title_year.str.contains(str(number), na=False)]
gross_year = pd.concat([gross_year, df1])
print (df1)
I give the function 2 parameters year 1 and year 2 and it should display a line graph for average, min, max based on gross earning for the years provided.
E.g if 2013 and 2015. It should display a line graph for 2013, 2014, 2015. However when I run str.contains(year1, na=True) it returns an empty dataframe. Can you tell me why ?
Upvotes: 1
Views: 184
Reputation: 836
Provided your title_year column is cast to an int, you could do something like the following.
import matplotlib.pyplot as plt
%matplotlib inline
def range_plot(year1, year2, agg):
for a in agg: # iterate through aggregate methods
_ = df[df['title_year'].between(year1, year2)] # subset DataFrame to contain only the year ranges specified
_ = _.groupby('title_year').agg(a) # groupby title_year, compute summary statistic
plt.plot(_.index.values, _['gross'], label=a) # plot
plt.legend() # display legend
plt.xlabel('Year')
plt.ylabel('Gross')
plt.title("{} - {}".format(year1, year2))
year1 and year2 are ints, and agg is a list of those aggregate functions you want to plot.
range_plot(2009, 2015, ['mean', 'sum', 'min', 'max'])
Upvotes: 1
Reputation: 1168
I am also a bit confused by the given code snippet, but if you just want to select certain years (as str values) in the dataframe, you could for example create a list of the years and then filter the dataframe accordingly.
years_to_select = ['2012', '2013', '2014']
filtered_df = original_df[original_df['year'].isin(years_to_select)]
Upvotes: 0