Tung Nguyen
Tung Nguyen

Reputation: 390

How to write recursion in dataframe?

I have the dataframe like this:

    Price    Signal
0   28.68     -1
1   33.36      1
2   44.7      -1
3   43.38      1 ---- smaller than Price[2] # False: Drop row[3,4]
4   41.67     -1
5   42.17      1 ---- smaller than Price[2] # False: Drop row[5,6]
6   44.21     -1
7   46.34      1 ---- greater than Price[2] # True: Keep
8   45.2      -1 
9   43.4       1 ---- Still Keep because it is the last row

My logic is keep the row if the signal 1 has price greater than the one before. If not it will drop its row and the next row since the signal must interspersed between -1 and 1 and also must compare the next signal 1 with the last one above (I have explained in the snapshot of my dataframe above).

The last one Signal 1 still keep although it is not sastified the condition because rule is the last one item of Signal column must be 1

Until now my effort is here:

def filter_sell(df):
    # For export the result
    filtered_sell_df = pd.DataFrame()

    for i in range(0, len(df) + 1):
        if df.iloc[i]["Signal"] == 1:
            if df.iloc[i]["Price"] > df.iloc[i - 1]["Price"]:
                pass
            else:
                try:
                    df.drop([i, i + 1])
                    filter_sell(df)
                # Try to handle the i + 1 above since len(df) is changed
                except RecursionError:
                    break
        else:
            pass

I'm new with writing recursion, thanks for your help!

Upvotes: 1

Views: 243

Answers (1)

Danila Ganchar
Danila Ganchar

Reputation: 11222

You can do it without recursion. By the way your approach will be slow because you call .drop() inside a loop. The easiest way is just use a new column to mark a rows for deletion.

df = pd.DataFrame({
    'Price': (28.68, 33.36, 44.7, 43.38, 41.67, 42.17, 44.21, 46.34, 45.2, 43.4),
    'Signal': (-1, 1, -1, 1, -1, 1, -1, 1, -1, 1),
})


# column with flag for deleting unnecessary records
df['max_price'] = 1
# default max_price in first row
max_price = df['Price'].loc[0]
index = 1
# because we do not check last record
stop_index = len(df.index) - 1

while index < stop_index:
    # just check max price because signal != 1
    if df['Signal'].loc[index] == -1:
        current = df['Price'].loc[index]
        if current > max_price:
            max_price = current
        index += 1
        continue

    current = df['Price'].loc[index]
    if max_price > current:
        # last max_price > current
        # set 'remove flag' to current and next row
        df['max_price'].loc[index] = 0
        df['max_price'].loc[index + 1] = 0
        # increase index to 2 because next row will be removed
        index += 2
        continue

    index += 1


# just drop records without max_price and drop column
df = df[df['max_price'] == 1]
df = df.drop(columns=['max_price'])
print(df)

Hope this helps.

Upvotes: 1

Related Questions