Wisdom258
Wisdom258

Reputation: 173

Create conditional column for Date Difference based on matching values in two columns

I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data.

        Date    Target1        Close
0   2019-04-17  209.2440    203.130005
1   2019-04-17  212.2155    203.130005
2   2019-04-17  213.6330    203.130005
3   2019-04-17  213.0555    203.130005
4   2019-04-17  212.6250    203.130005
5   2019-04-17  212.9820    203.130005
6   2019-04-17  213.1395    203.130005
7   2019-04-16  209.2860    199.250000
8   2019-04-16  209.9055    199.250000
9   2019-04-16  210.3045    199.250000

I want to create another column (for each observation) (called days_to_hit_target for example) which is the difference of days such that close hits (or comes very close to target of specific day), when it does closes that very closely, then it counts the difference of days and put them in the column days_to_hit_target.

Upvotes: 0

Views: 97

Answers (3)

Lante Dellarovere
Lante Dellarovere

Reputation: 1858

maybe a little faster solution:

import pandas as pd

# df is your DataFrame
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values("Date")

def days_to_hit(x, no_hit_default=None):
    return next(
        ((df["Date"].iloc[j+x.name] - x["Date"]).days 
         for j in range(len(df)-x.name) 
         if df["Close"].iloc[j+x.name] >= x["Target1"]), no_hit_default)

df["days_to_hit_target"] = df.apply(days_to_hit, axis=1)

Upvotes: 0

SpaceMonkey55
SpaceMonkey55

Reputation: 457

NOTE I use python 3.7.1 and pandas 0.23.4. I came up with something very dirty; I am sure there is a neater and more efficient way of doing this.

### Create sample data
date_range = pd.date_range(start="1/1/2018", end="20/1/2018", freq="6H", closed="right")

target1 = np.random.uniform(10, 30, len(date_range))

close = [[i]*4 for i in np.random.uniform(10,30, len(date_range)//4)]
close_flat = np.array([item for sublist in close for item in sublist])

df = pd.DataFrame(np.array([np.array(date_range.date), target1,
    close_flat]).transpose(), columns=["date", "target", "close"])


### Create the column you need
# iterating over the days and finding days when the difference between
# "close" of current day and all "target" is lower than 0.25 OR the "target"
# value is greater than "close" value.
thresh = 0.25
date_diff_arr = np.zeros(len(df))
for i in range(0,len(df),4):
    diff_lt_thresh = df[(abs(df.target-df.close.iloc[i]) < thresh) | (df.target > df.close.iloc[i])]
    # only keep the findings from the next day onwards
    diff_lt_thresh = diff_lt_thresh.loc[i+4:]
    if not diff_lt_thresh.empty:
        # find day difference only if something under thresh is found
        days_diff = (diff_lt_thresh.iloc[0].date - df.iloc[i].date).days
    else:
        # otherwise write it as nan
        days_diff = np.nan
    # fill in the np.array which will be used to write to the df
    date_diff_arr[i:i+4] = days_diff

df["date_diff"] = date_diff_arr

Sample output:

0   2018-01-01    21.64  26.7319        2.0
1   2018-01-01  22.9047  26.7319        2.0
2   2018-01-01  26.0945  26.7319        2.0
3   2018-01-02  10.2155  26.7319        2.0
4   2018-01-02  17.5602  11.0507        1.0
5   2018-01-02  12.0368  11.0507        1.0
6   2018-01-02  19.5923  11.0507        1.0
7   2018-01-03  21.8168  11.0507        1.0
8   2018-01-03  11.5433  16.8862        1.0
9   2018-01-03  27.3739  16.8862        1.0
10  2018-01-03  26.9073  16.8862        1.0
11  2018-01-04  19.6677  16.8862        1.0
12  2018-01-04  25.3599  27.3373        1.0
13  2018-01-04  22.7479  27.3373        1.0
14  2018-01-04  18.7246  27.3373        1.0
15  2018-01-05  25.4122  27.3373        1.0
16  2018-01-05  28.3294  23.8469        1.0

Upvotes: 1

micric
micric

Reputation: 671

This should work:

daysAboveTarget = []
for i in range(len(df.Date)):
    try:
        dayAboveTarget = df.iloc[i:].loc[(df.Close > df.Target1[i])]['Date'].iloc[0]
    except IndexError:
        dayAboveTarget = None
    daysAboveTarget.append(dayAboveTarget)
daysAboveTarget = pd.Series(daysAboveTarget)
df['days_to_hit_target'] = daysAboveTarget - df.Date

I sort of overused iloc and loc here, so let me explain. The variable dayAboveTarget gets the date when the price closes above the target. The first iloc subsets the dataframe to only future dates, the first loc finds the actual results, the second iloc gets only the first result. We need the exception for days where the price never goes above target.

Upvotes: 1

Related Questions