Reputation: 2805
I have a dataframe with multiple columns. One of the columns (denoted as B in the example) works as a trigger, i.e., I have to drop all rows after the first value bigger than 0.5. However, I have to conserve this first number.
An example is given above. All rows after 0.59 (which is the first that obeys to the condition of being bigger than 0.5) are deleted.
initial_df = pd.DataFrame([[1,0.4], [5,0.43], [4,0.59], [11,0.41], [9,0.61]], columns = ['A', 'B'])
Bellow the blue box indicates the trigger and the red box the values that have to dropped. In the end we will have:
The final goal is to obtain the following dataframe:
Is it possible to do it in pandas in a efficient way (not using a for loop)?
Upvotes: 0
Views: 1950
Reputation: 164813
You can use np.where
with Boolean indexing to extract the positional index of the first value matching a condition. Then feed this to iloc
:
idx = np.where(df['B'].gt(0.5))[0][0]
res = df.iloc[:idx+1]
print(res)
A B
0 1 0.40
1 5 0.43
2 4 0.59
For very large dataframes where the condition is likely to met early on, more optimal would be to use next
with a generator expression to calculate idx
:
idx = next((idx for idx, val in enumerate(df['B']) if val > 0.5), len(df.index))
For better performance, see Efficiently return the index of the first value satisfying condition in array.
Upvotes: 3
Reputation: 979
So this works if your index is the same as your iloc
:
first_occurence = initial_df[initial_df.B>0.5].index[0]
initial_df.iloc[:first_occurence+1]
EDIT: this is a more general solution
first_occurence = initial_df.index.get_loc(initial_df[initial_df.B>0.5].iloc[0].name)
final_df = initial_df.iloc[:first_occurence+1]
Upvotes: 3
Reputation: 2805
I found a solution similar to the one shown by jpp:
indices = initial_df.index
trigger = initial_df[initial_df.B > 0.5].index[0]
initial_df[initial_df.index.isin(indices[indices<=trigger])]
Since the real dataframe has multiple indices, this is the only solution that I found.
Upvotes: 2
Reputation: 2018
I am assuming you want to delete all rows where "B" column value is less than 0.5.
Try this:
initial_df = pd.DataFrame([[1, 0.4], [5, 0.43], [4, 0.59], [11, 0.41], [9, 0.61]], columns=['A', 'B'])
final_df = initial_df[initial_df['B'] >= 0.5]
The resulting data frame, final_df is:
A B
2 4 0.59
4 9 0.61
Upvotes: 1