Reputation: 53
I have a database object that returns my query outputs as a pandas df.
One of my queries generates a list of dates (df1):
data_interestDate
0 2020-07-15T00:00:00
1 2020-06-11T00:00:00
2 2020-05-14T00:00:00
3 2020-04-14T00:00:00
The other query returns list of values corresponding to several dates (df2):
data_date value
0 2020-07-21T00:00:00 47.0
1 2020-07-20T00:00:00 46.0
2 2020-07-17T00:00:00 50.0
3 2020-07-16T00:00:00 46.0
4 2020-07-15T00:00:00 48.0
5 2020-07-14T00:00:00 49.0
6 2020-07-13T00:00:00 48.0
7 2020-07-10T00:00:00 49.0
8 2020-07-09T00:00:00 46.0
9 2020-07-08T00:00:00 51.0
10 2020-07-07T00:00:00 49.0
11 2020-07-06T00:00:00 53.0
I want to iterate through df1 and find the matching dates in df2. Once I have that, in df2 I want to take the difference between the value corresponding to that date and the value x number of rows before that. For example, for 2020-07-15T00:00:00 in df1, I would find that date in df2 and then do something like:
(df2['value']-df2['value'].shift(-5)).iloc()[0]
which should return 3.0, and then overall output of
2020-07-15T00:00:00 -5 3.0
2020-06-11T00:00:00 -5 ...
2020-05-14T00:00:00 -5 ...
2020-04-14T00:00:00 -5 ...
Upvotes: 3
Views: 157
Reputation: 36
I found myself doing some formatting to get you to where you wanted your format to be, but:
Query DF 1 (df1):
df1.head()
data_interestDate
0 2020-07-15T00:00:00
1 2020-06-11T00:00:00
2 2020-05-14T00:00:00
3 2020-04-14T00:00:00
Query DF 2 (df2):
df2.head()
data_date value
0 2020-07-21T00:00:00 47.0
1 2020-07-20T00:00:00 46.0
2 2020-07-17T00:00:00 50.0
3 2020-07-16T00:00:00 46.0
4 2020-07-15T00:00:00 48.0
Identify your shift value:
shift_val = -5
Set your literal column (since the output you desired had it):
df2['shift'] = shift_val
Generate a df2 with a 'diff' column, giving the absolute value difference of value and value.shift(shift_value):
df2.loc[(df2['data_date'].isin(df1['data_interestDate'])), 'diff'] = abs(pd.to_numeric(df2['value'])-pd.to_numeric(df2['value'].shift(shift_val)))
Your question seems to say you want to return df2, but your examples shows a df1 return. Here's both:
Returning df1 (as df3):
df3 = df1.merge(df2, left_on='data_interestDate', right_on='data_date', how='left').drop(['data_date','value'], axis=1)
df3['shift'] = shift_val
There we do lose the shift value on the join, so it's added back (no idea on this one). Outputs:
df3.head()
data_interestDate shift diff
0 2020-07-15T00:00:00 -5 3.0
1 2020-06-11T00:00:00 -5 NaN
2 2020-05-14T00:00:00 -5 NaN
3 2020-04-14T00:00:00 -5 NaN
Returning df2 (as df3):
df3 = df2[df2['data_date'].isin(df1['data_interestDate'])].drop(['value'],axis=1)
Outputs:
df3.head()
data_date shift diff
4 2020-07-15T00:00:00 -5 3.0
Upvotes: 1