nunodsousa
nunodsousa

Reputation: 2805

Delete all rows bellow a certain condition in pandas

I have a dataframe with multiple columns. One of the columns (denoted as B in the example) works as a trigger, i.e., I have to drop all rows after the first value bigger than 0.5. However, I have to conserve this first number.

An example is given above. All rows after 0.59 (which is the first that obeys to the condition of being bigger than 0.5) are deleted.

initial_df = pd.DataFrame([[1,0.4], [5,0.43], [4,0.59], [11,0.41], [9,0.61]], columns = ['A', 'B'])

enter image description here

Bellow the blue box indicates the trigger and the red box the values that have to dropped. In the end we will have:

enter image description here

The final goal is to obtain the following dataframe: enter image description here

Is it possible to do it in pandas in a efficient way (not using a for loop)?

Upvotes: 0

Views: 1950

Answers (4)

jpp
jpp

Reputation: 164813

You can use np.where with Boolean indexing to extract the positional index of the first value matching a condition. Then feed this to iloc:

idx = np.where(df['B'].gt(0.5))[0][0]
res = df.iloc[:idx+1]

print(res)

   A     B
0  1  0.40
1  5  0.43
2  4  0.59

For very large dataframes where the condition is likely to met early on, more optimal would be to use next with a generator expression to calculate idx:

idx = next((idx for idx, val in enumerate(df['B']) if val > 0.5), len(df.index))

For better performance, see Efficiently return the index of the first value satisfying condition in array.

Upvotes: 3

onno
onno

Reputation: 979

So this works if your index is the same as your iloc:

first_occurence = initial_df[initial_df.B>0.5].index[0]
initial_df.iloc[:first_occurence+1]

EDIT: this is a more general solution

first_occurence = initial_df.index.get_loc(initial_df[initial_df.B>0.5].iloc[0].name)
final_df = initial_df.iloc[:first_occurence+1]

Upvotes: 3

nunodsousa
nunodsousa

Reputation: 2805

I found a solution similar to the one shown by jpp:

indices = initial_df.index
trigger = initial_df[initial_df.B > 0.5].index[0]
initial_df[initial_df.index.isin(indices[indices<=trigger])]

Since the real dataframe has multiple indices, this is the only solution that I found.

Upvotes: 2

emirc
emirc

Reputation: 2018

I am assuming you want to delete all rows where "B" column value is less than 0.5.

Try this:

initial_df = pd.DataFrame([[1, 0.4], [5, 0.43], [4, 0.59], [11, 0.41], [9, 0.61]], columns=['A', 'B'])

final_df = initial_df[initial_df['B'] >= 0.5]

The resulting data frame, final_df is:

   A     B
2  4  0.59
4  9  0.61

Upvotes: 1

Related Questions