Finding row in pandas df and performing a diff relative to that row location

Question

I have a database object that returns my query outputs as a pandas df.

One of my queries generates a list of dates (df1):

      data_interestDate
0  2020-07-15T00:00:00
1  2020-06-11T00:00:00
2  2020-05-14T00:00:00
3  2020-04-14T00:00:00

The other query returns list of values corresponding to several dates (df2):

              data_date value
0   2020-07-21T00:00:00  47.0
1   2020-07-20T00:00:00  46.0
2   2020-07-17T00:00:00  50.0
3   2020-07-16T00:00:00  46.0
4   2020-07-15T00:00:00  48.0
5   2020-07-14T00:00:00  49.0
6   2020-07-13T00:00:00  48.0
7   2020-07-10T00:00:00  49.0
8   2020-07-09T00:00:00  46.0
9   2020-07-08T00:00:00  51.0
10  2020-07-07T00:00:00  49.0
11  2020-07-06T00:00:00  53.0

I want to iterate through df1 and find the matching dates in df2. Once I have that, in df2 I want to take the difference between the value corresponding to that date and the value x number of rows before that. For example, for 2020-07-15T00:00:00 in df1, I would find that date in df2 and then do something like:

(df2['value']-df2['value'].shift(-5)).iloc()[0]

which should return 3.0, and then overall output of

2020-07-15T00:00:00  -5  3.0
2020-06-11T00:00:00  -5  ...
2020-05-14T00:00:00  -5  ...
2020-04-14T00:00:00  -5  ...

Kate · Accepted Answer

I found myself doing some formatting to get you to where you wanted your format to be, but:

Query DF 1 (df1):

df1.head()

data_interestDate
0   2020-07-15T00:00:00
1   2020-06-11T00:00:00
2   2020-05-14T00:00:00
3   2020-04-14T00:00:00

Query DF 2 (df2):

df2.head()

    data_date   value
0   2020-07-21T00:00:00 47.0
1   2020-07-20T00:00:00 46.0
2   2020-07-17T00:00:00 50.0
3   2020-07-16T00:00:00 46.0
4   2020-07-15T00:00:00 48.0

Identify your shift value:

shift_val = -5

Set your literal column (since the output you desired had it):

df2['shift'] = shift_val

Generate a df2 with a 'diff' column, giving the absolute value difference of value and value.shift(shift_value):

df2.loc[(df2['data_date'].isin(df1['data_interestDate'])), 'diff'] = abs(pd.to_numeric(df2['value'])-pd.to_numeric(df2['value'].shift(shift_val)))

Your question seems to say you want to return df2, but your examples shows a df1 return. Here's both:

Returning df1 (as df3):

df3 = df1.merge(df2, left_on='data_interestDate', right_on='data_date', how='left').drop(['data_date','value'], axis=1)
df3['shift'] = shift_val

There we do lose the shift value on the join, so it's added back (no idea on this one). Outputs:

df3.head()

    data_interestDate   shift   diff
0   2020-07-15T00:00:00 -5  3.0
1   2020-06-11T00:00:00 -5  NaN
2   2020-05-14T00:00:00 -5  NaN
3   2020-04-14T00:00:00 -5  NaN

Returning df2 (as df3):

df3 = df2[df2['data_date'].isin(df1['data_interestDate'])].drop(['value'],axis=1)

Outputs:

df3.head()

data_date   shift   diff
4   2020-07-15T00:00:00 -5  3.0

Finding row in pandas df and performing a diff relative to that row location

Answers (1)

Related Questions