Reputation: 1573
I want to find the first value after each row that meets a certain criteria. So for example I want to find the first rate/value (not necessarily the first row after) after the current row that increased 5%. The added column would be the last 'first5percentIncrease' and would be the index (and/or value) of the first row (after current row) that had a 5% increase. Notice how each could not be lower than the current row's index.
amount date rate total type first5percentIncreaseValue first5percentIncreaseIndex
9248 0.05745868 2018-01-22 06:11:36 10 0.00099984 buy 10.5 9341
9249 1.14869147 2018-01-22 06:08:38 20 0.01998989 buy 21 9421
9250 0.16498080 2018-01-22 06:02:59 15 0.00286241 sell 15.75 9266
9251 0.02881844 2018-01-22 06:01:54 2 0.00049999 sell 2.1 10911
I tried using loc to apply() this to each row. The output takes at least 10 seconds for only about 9k rows. This does the job (I get a list of all values 5% higher than the given row) but is there a more efficient way to do this? Also I'd like to get only the first value but when I take do this I think it's starting from the first row. Is there a way to start .locs search from the current row so then I could just take the first value?
coin_trade_history_df['rate'].apply(
lambda y: coin_trade_history_df['rate'].loc[coin_trade_history_df['rate'].apply(
lambda x: y >= x + (x*.005))])
0 [0.01387146, 0.01387146, 0.01387148, 0.0138714...
1 [0.01387146, 0.01387146, 0.01387148, 0.0138714...
2 [0.01387146, 0.01387146, 0.01387148, 0.0138714...
3 [0.01387146, 0.01387146, 0.01387148, 0.0138714...
4 [0.01387146, 0.01387146, 0.01387148, 0.0138714...
Name: rate, dtype: object
Further clarification Peter Leimbigler said it better than me:
Oh, I think I get it now! "For each row, scan downward and get the first row you encounter that shows an increase of at least 5%," right? I'll edit my answer :) – Peter Leimbigler
Upvotes: 2
Views: 3692
Reputation: 1573
Peter's answer was much faster but it only looked at the immediate next row. I wanted it to perform this on every row. Below is what I ended up with - not very fast but it goes through each row and returns the first value (or last value in my case since my time series was descending) that satisfied my criteria (increasing 5%).
def test_rows(x):
return trade_history_df['rate'].loc[
trade_history_df['rate'] >= x['rate'] + (x['rate'] * .05)].loc[
trade_history_df['date'] > x['date']].last_valid_index()
test1 = trade_history_df[['rate','date']].apply(test_rows,axis = 1)
Upvotes: 1
Reputation: 11105
Here's an approach to the specific example of labeling each row with the index of the next available row that shows an increase of at least 5%.
# Example data
df = pd.DataFrame({'rate': [100, 105, 99, 110, 130, 120, 98]})
# Series.shift(n) moves elements n places forward = down. We use
# it here in the denominator in order to compare each change with
# the initial value, rather than the final value.
mask = df.rate.diff()/df.rate.shift() >= 0.05
df.loc[mask, 'next_big_change_idx'] = df[mask].index
df.next_big_change_idx = df.next_big_change_idx.bfill().shift(-1)
# output
df
rate next_big_change_idx
0 100 1.0
1 105 3.0
2 99 3.0
3 110 4.0
4 130 NaN
5 120 NaN
6 98 NaN
Upvotes: 1