Reputation: 611
I have a Dataframe from pandas like this:
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price':7, 'Check': 0},
{'Date': '2-10-19','Price':8.5, 'Check': 0},
{'Date': '3-10-19','Price':9, 'Check': 1},
{'Date': '4-10-19','Price':50, 'Check': 1},
{'Date': '5-10-19','Price':80, 'Check': 1},
{'Date': '6-10-19','Price':100, 'Check': 1}]
df = pd.DataFrame(raw_data)
df.set_index('Date')
This is what it looks like:
Price Check
Date
1-10-19 7.0 0
2-10-19 8.5 0
3-10-19 9.0 1
4-10-19 50.0 1
5-10-19 80.0 1
6-10-19 100.0 1
Now what I'm trying to do is that for each row where 'Check" is 1, I want to check the number of rows prior to that row in which the price was less than 10% of that row's price. For example, for the 6th row where the price is 100, I want to iterate over the the previous rows and count the rows until the price is less than 10 (10% of 100), which in this case would 3 rows prior where the price is 9. Then want to save the results in a new column.
The final result would look like this:
Price Check Rows_till_small
Date
1-10-19 7.0 0 NaN
2-10-19 8.5 0 NaN
3-10-19 9.0 1 Nan
4-10-19 50.0 1 NaN
5-10-19 80.0 1 4
6-10-19 100.0 1 3
I've thought a lot about how I could do this using some kind of Rolling function, but I don't think it's possible. I've also thought about iterating through the entire DataFrame using iterrows or itertuples, but I can't imagine of a way to do it without being extremely inefficient.
Upvotes: 2
Views: 3153
Reputation: 3836
You can solve the issue the following way:
import pandas as pd
raw_data = [{'Date': '1-10-19', 'Price': 7, 'Check': 0},
{'Date': '2-10-19', 'Price': 8.5, 'Check': 0},
{'Date': '3-10-19', 'Price': 9, 'Check': 1},
{'Date': '4-10-19', 'Price': 50, 'Check': 1},
{'Date': '5-10-19', 'Price': 80, 'Check': 1},
{'Date': '6-10-19', 'Price': 100, 'Check': 1}]
df = pd.DataFrame(raw_data)
new_column = [None] * len(df["Price"]) # create new column
for i in range(len(df["Price"])):
if df['Check'][i] == 1:
percent_10 = df['Price'][i] * 0.1
for j in range(i, -1, -1):
print(j)
if df['Price'][j] < percent_10:
new_column[i] = i - j
break
df["New"] = new_column # add new column
print(df)
Hope the answer is useful for you, feel free to ask questions.
Upvotes: 1
Reputation: 8033
Check this out
diff = df['Price'].apply(lambda x:x > (df['Price']*.1))
RTS=[]
for i in range(len(df)):
check = (diff)[i]
ind = check.idxmax()
if ind != 0:
val = (i-ind)+1
else:
val = np.nan
RTS.append(val)
df['Rows_till_small'] = RTS
print(df)
Output
Date Price Check Rows_till_small
0 1-10-19 7.0 0 NaN
1 2-10-19 8.5 0 NaN
2 3-10-19 9.0 1 NaN
3 4-10-19 50.0 1 NaN
4 5-10-19 80.0 1 4.0
5 6-10-19 100.0 1 3.0
Upvotes: 2