Reputation: 1637
I'm a newbie still trying to figure out Pandas - split/apply/combine is just starting to make sense but I'm not there yet.
I'm trying to get the historical high and low sales call numbers from the same day in previous years (2013-2015), and then have another column that tells me what days the 2016 calls were higher than previous highs or lower than previous lows and what the values were.
Here's my attempt so far:
df = pd.read_csv('filename.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df[(df['Date']>= '01/01/2013') & (df['Date'] <= '12/31/2015')]
df['Month'] = df.Date.dt.month
df['Day'] = df.Date.dt.day
dfMAX = df[df['Element'] == "MAX"]
dfMAX = (dfMAX.groupby([dfMAX.Date.dt.month, dfMAX.Date.dt.day, 'Element'])
[['Data_Value']]
.agg(['max']))
This gets me the max values per day and I know I could repeat this for the min values. I'm struggling to know how to put it back together with the multi-index and then how to get the 2016 values that were either higher or lower than the min/max already recorded.
I'm not worried about leap year data, Feb 29th can be dropped/ignored for this.
The output would be something like this:
Day of Year Min Max 2016
1/1 50 1900
1/2 23 2100 2102
1/3 90 1800 85
1/4 89 1750
1/5 50 2309 45
1/6 44 5600 5649
I've posted a csv file here: https://drive.google.com/open?id=0B4xdnV0LFZI1dUE3ZFBxdWFQOGc
Thank you for your help, Me
Upvotes: 2
Views: 5235
Reputation: 153460
I would tackle this using the follow steps:
.dt.dayofyear
.groupby
with .agg
of
min and max to get your min and max values for 2013 thru 2015 to create a
dataframe daily records.pd.merge
, you can set the index on 2016 to date and the
merge on indexes using left_index=True
and right_index=True
.You should get something like this with your data:
min max Date Element Value
DayofYear
1 545 1812 2016-01-01 MAX 1887
3 108 1815 2016-01-03 MAX 1906
4 496 1618 2016-01-04 MAX 1701
6 455 1864 2016-01-06 MIN 169
8 511 1771 2016-01-08 MIN 232
See my spoiler code hover below.
df = pd.read_csv('Downloads/Day_Over_Day.csv', parse_dates=['Date'])
1. df['DayofYear']=df['Date'].dt.dayofyear
2a. df_13_15 = df.query('Date < "2016-01-01"')
2b. df_2016 = df.query('Date >= "2016-01-01"')
3. & 4. df_Overlay = pd.merge(df_13_15.groupby('DayofYear') ['Value'].agg(['min','max']),df_2016.set_index('DayofYear'), left_index=True, right_index=True)
5. print(df_Overlay[(df_Overlay['Value'] < df_Overlay['min']) | (df_Overlay['Value'] > df_Overlay['max'])].head())
Upvotes: 3