Reputation: 173
I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data.
Date Target1 Close
0 2019-04-17 209.2440 203.130005
1 2019-04-17 212.2155 203.130005
2 2019-04-17 213.6330 203.130005
3 2019-04-17 213.0555 203.130005
4 2019-04-17 212.6250 203.130005
5 2019-04-17 212.9820 203.130005
6 2019-04-17 213.1395 203.130005
7 2019-04-16 209.2860 199.250000
8 2019-04-16 209.9055 199.250000
9 2019-04-16 210.3045 199.250000
I want to create another column (for each observation) (called days_to_hit_target
for example) which is the difference of days such that close hits (or comes very close to target of specific day), when it does closes that very closely, then it counts the difference of days and put them in the column days_to_hit_target.
Upvotes: 0
Views: 97
Reputation: 1858
maybe a little faster solution:
import pandas as pd
# df is your DataFrame
df["Date"] = pd.to_datetime(df["Date"])
df = df.sort_values("Date")
def days_to_hit(x, no_hit_default=None):
return next(
((df["Date"].iloc[j+x.name] - x["Date"]).days
for j in range(len(df)-x.name)
if df["Close"].iloc[j+x.name] >= x["Target1"]), no_hit_default)
df["days_to_hit_target"] = df.apply(days_to_hit, axis=1)
Upvotes: 0
Reputation: 457
NOTE I use python 3.7.1 and pandas 0.23.4. I came up with something very dirty; I am sure there is a neater and more efficient way of doing this.
### Create sample data
date_range = pd.date_range(start="1/1/2018", end="20/1/2018", freq="6H", closed="right")
target1 = np.random.uniform(10, 30, len(date_range))
close = [[i]*4 for i in np.random.uniform(10,30, len(date_range)//4)]
close_flat = np.array([item for sublist in close for item in sublist])
df = pd.DataFrame(np.array([np.array(date_range.date), target1,
close_flat]).transpose(), columns=["date", "target", "close"])
### Create the column you need
# iterating over the days and finding days when the difference between
# "close" of current day and all "target" is lower than 0.25 OR the "target"
# value is greater than "close" value.
thresh = 0.25
date_diff_arr = np.zeros(len(df))
for i in range(0,len(df),4):
diff_lt_thresh = df[(abs(df.target-df.close.iloc[i]) < thresh) | (df.target > df.close.iloc[i])]
# only keep the findings from the next day onwards
diff_lt_thresh = diff_lt_thresh.loc[i+4:]
if not diff_lt_thresh.empty:
# find day difference only if something under thresh is found
days_diff = (diff_lt_thresh.iloc[0].date - df.iloc[i].date).days
else:
# otherwise write it as nan
days_diff = np.nan
# fill in the np.array which will be used to write to the df
date_diff_arr[i:i+4] = days_diff
df["date_diff"] = date_diff_arr
Sample output:
0 2018-01-01 21.64 26.7319 2.0
1 2018-01-01 22.9047 26.7319 2.0
2 2018-01-01 26.0945 26.7319 2.0
3 2018-01-02 10.2155 26.7319 2.0
4 2018-01-02 17.5602 11.0507 1.0
5 2018-01-02 12.0368 11.0507 1.0
6 2018-01-02 19.5923 11.0507 1.0
7 2018-01-03 21.8168 11.0507 1.0
8 2018-01-03 11.5433 16.8862 1.0
9 2018-01-03 27.3739 16.8862 1.0
10 2018-01-03 26.9073 16.8862 1.0
11 2018-01-04 19.6677 16.8862 1.0
12 2018-01-04 25.3599 27.3373 1.0
13 2018-01-04 22.7479 27.3373 1.0
14 2018-01-04 18.7246 27.3373 1.0
15 2018-01-05 25.4122 27.3373 1.0
16 2018-01-05 28.3294 23.8469 1.0
Upvotes: 1
Reputation: 671
This should work:
daysAboveTarget = []
for i in range(len(df.Date)):
try:
dayAboveTarget = df.iloc[i:].loc[(df.Close > df.Target1[i])]['Date'].iloc[0]
except IndexError:
dayAboveTarget = None
daysAboveTarget.append(dayAboveTarget)
daysAboveTarget = pd.Series(daysAboveTarget)
df['days_to_hit_target'] = daysAboveTarget - df.Date
I sort of overused iloc and loc here, so let me explain. The variable dayAboveTarget gets the date when the price closes above the target. The first iloc subsets the dataframe to only future dates, the first loc finds the actual results, the second iloc gets only the first result. We need the exception for days where the price never goes above target.
Upvotes: 1