Getting seasons from dataset by using Pandas

Question

Given the following dataset:

"";"M_001";"M_002";"M_003";"M_004"
"2011-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2011-02-01 00:00:00";18,40;0,124;174,36;11,098
"2011-03-01 00:00:00";25,789;27,67;19,76;34,66
"2011-04-01 00:00:00";19,08;11,078;23,34;67,45
"2011-05-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-21 00:00:00";13,06;06,078;10,34;21,45
"2011-07-01 00:00:00";9,06;06,078;9,34;21,45
"2011-07-14 00:00:00";9,06;06,078;9,34;21,45
"2011-08-01 00:00:00";22,06;45,078;21,34;21,45
"2011-08-11 00:00:00";22,06;45,078;21,34;21,45
"2011-08-12 00:00:00";22,06;45,078;21,34;21,45
"2011-09-01 00:00:00";76,06;32,078;10,34;21,45
"2011-09-23 00:00:00";76,06;32,078;10,34;21,45
"2011-09-25 00:00:00";76,06;32,078;10,34;21,45
"2011-10-01 00:00:00";17,06;18,078;108,34;21,45
"2011-11-01 00:00:00";12,06;45,078;107,34;21,45
"2011-12-01 00:00:00";7,06;60,078;83,34;21,45
"2011-12-21 00:00:00";7,06;60,078;83,34;21,45
"2012-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2012-02-01 00:00:00";18,40;0,124;174,36;11,098
"2012-03-01 00:00:00";25,789;27,67;19,76;34,66
"2012-03-11 00:00:00";25,789;27,67;19,76;34,66
"2012-03-20 00:00:00";25,789;27,67;19,76;34,66
"2012-03-30 00:00:00";25,789;27,67;19,76;34,66

Could anyone tell me how to modify the function calc() to select the rows from the dataset in order that I can get separately the rows about both the winter season (from 21 december to 20 march) and the summer season (from 21 june to 23 september) from read_csv?

I have already tried on writing this code, but it doesn't work well.

import pandas as pd 

def calc():
    filename = 'mydataset/dataset.csv'
    mySeries = pd.read_csv(filename, header=0, index_col=0, parse_dates=[0], sep=";", decimal=",")

    return mySeries

if __name__ == '__main__':
    df = calc()
    print("Winter season measures: ")
    print(df.iloc[[x in range(12, 3) for x in df.index.month]])
    print("Winter season measures: ")
    print(df.iloc[[x in range(6, 10) for x in df.index.month]])

Thank you in advance!

BenG · Accepted Answer

I recreated your DF here:

from io import StringIO
import pandas as pd 
text = StringIO('''"";"M_001";"M_002";"M_003";"M_004"
"2011-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2011-02-01 00:00:00";18,40;0,124;174,36;11,098
"2011-03-01 00:00:00";25,789;27,67;19,76;34,66
"2011-04-01 00:00:00";19,08;11,078;23,34;67,45
"2011-05-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-01 00:00:00";13,06;06,078;10,34;21,45
"2011-06-21 00:00:00";13,06;06,078;10,34;21,45
"2011-07-01 00:00:00";9,06;06,078;9,34;21,45
"2011-07-14 00:00:00";9,06;06,078;9,34;21,45
"2011-08-01 00:00:00";22,06;45,078;21,34;21,45
"2011-08-11 00:00:00";22,06;45,078;21,34;21,45
"2011-08-12 00:00:00";22,06;45,078;21,34;21,45
"2011-09-01 00:00:00";76,06;32,078;10,34;21,45
"2011-09-23 00:00:00";76,06;32,078;10,34;21,45
"2011-09-25 00:00:00";76,06;32,078;10,34;21,45
"2011-10-01 00:00:00";17,06;18,078;108,34;21,45
"2011-11-01 00:00:00";12,06;45,078;107,34;21,45
"2011-12-01 00:00:00";7,06;60,078;83,34;21,45
"2011-12-21 00:00:00";7,06;60,078;83,34;21,45
"2012-01-01 00:00:00";4,45;3,5467;3,197;12,098
"2012-02-01 00:00:00";18,40;0,124;174,36;11,098
"2012-03-01 00:00:00";25,789;27,67;19,76;34,66
"2012-03-11 00:00:00";25,789;27,67;19,76;34,66
"2012-03-20 00:00:00";25,789;27,67;19,76;34,66
"2012-03-30 00:00:00";25,789;27,67;19,76;34,66''')
df = pd.read_csv(filepath_or_buffer=text, sep=';', header=0, index_col=0, decimal=',', parse_dates=[0])

Then I wrote some code that creates two new data frames and appends all the months in your winter and summer ranges. EDIT: Commented out old version, preserved below.

winterStart = '-12-21'
winterEnd   = '-03-20'
summerStart = '-06-21'
summerEnd   = '-09-23'

#df_winter = df.ix[str('2010'+winterStart):str('2011'+winterEnd)]
#df_winter = df_winter.append(df.ix['2011'+winterStart:'2012'+winterEnd])
#df_winter = df_winter.append(df.ix['2012'+winterStart:'2013'+winterEnd])

#df_summer = df.ix['2010'+summerStart:'2010'+summerEnd]
#df_summer = df_summer.append(df.ix['2011'+summerStart:'2011'+summerEnd])
#df_summer = df_summer.append(df.ix['2012'+summerStart:'2012'+summerEnd])

If you had more years, you can create a loop that iterates through each subsequent year and appends that year's seasonal data. EDIT: OP asked for this functionality. Added a loop to get all years without specifying each year for each season. Another comment mentioned df.ix[] is depreciated, so I changed the code to use df.loc[] instead of df.ix[] as in previous version.

df_winter = pd.DataFrame()
for year in range(2010, 2015):
    df_winter = df_winter.append(df.loc[str(year) + winterStart : str(year+1) + winterEnd]) 
    # used year and year+1 because winter season spans from an initial year to the next year.
print(df_winter)

df_summer = pd.DataFrame()
for year in range(2010, 2015):
    df_summer = df_summer.append(df.loc[str(year) + summerStart : str(year) + summerEnd])
print(df_summer)

Also please see Filtering Pandas DataFrames on dates for filtering between date ranges when your date is the index.

Getting seasons from dataset by using Pandas

Answers (1)

Related Questions