Pavel Vasilev
Pavel Vasilev

Reputation: 31

dataframe + pandas + select specific rows

I'm new in using pandas and one of my functions does not behave as expected. I have this dataframe:

     title_year        gross
0          2009  7.60506e+08
1          2007  3.09404e+08
2          2015  2.00074e+08
3          2012  4.48131e+08
5          2012  7.30587e+07
6          2007   3.3653e+08
7          2010  2.00807e+08
8          2015  4.58992e+08
9          2009  3.01957e+08

The function is:

def analysis_gross_per_year(year1, year2):
    year_df = data[['title_year', 'gross']]
    check = True
    year_df.title_year = year_df.title_year.fillna('Not Given')
    year_df.gross = year_df.gross.fillna('Not Given')
    year_df = year_df[year_df.gross != 'Not Given']
    gross_year = year_df[year_df.title_year.str.contains(year1, na=True)]
    number = int(year1)
    while check :
        if str(number) == year2:
            check = False
        else:
            number = number + 1
            df1 = year_df[year_df.title_year.str.contains(str(number), na=False)]
            gross_year = pd.concat([gross_year, df1])
            print (df1)

I give the function 2 parameters year 1 and year 2 and it should display a line graph for average, min, max based on gross earning for the years provided.

E.g if 2013 and 2015. It should display a line graph for 2013, 2014, 2015. However when I run str.contains(year1, na=True) it returns an empty dataframe. Can you tell me why ?

Upvotes: 1

Views: 184

Answers (2)

Ben
Ben

Reputation: 836

Provided your title_year column is cast to an int, you could do something like the following.

import matplotlib.pyplot as plt
%matplotlib inline

def range_plot(year1, year2, agg):
    for a in agg: # iterate through aggregate methods
        _ = df[df['title_year'].between(year1, year2)] # subset DataFrame to contain only the year ranges specified
        _ = _.groupby('title_year').agg(a) # groupby title_year, compute summary statistic
        plt.plot(_.index.values, _['gross'], label=a) # plot

    plt.legend() # display legend
    plt.xlabel('Year')
    plt.ylabel('Gross')
    plt.title("{} - {}".format(year1, year2))

year1 and year2 are ints, and agg is a list of those aggregate functions you want to plot.

range_plot(2009, 2015, ['mean', 'sum', 'min', 'max'])

enter image description here

Upvotes: 1

LeBavarois
LeBavarois

Reputation: 1168

I am also a bit confused by the given code snippet, but if you just want to select certain years (as str values) in the dataframe, you could for example create a list of the years and then filter the dataframe accordingly.

years_to_select = ['2012', '2013', '2014']
filtered_df = original_df[original_df['year'].isin(years_to_select)]

Upvotes: 0

Related Questions