Reputation: 1395
Assuming I have the following Pandas DataFrame:
U A B
0 2000 10 20
1 3000 40 0
2 2100 20 30
3 2500 0 30
4 2600 30 40
How can I get the index of first row that both A and B have non-zero value and (A+B)/2 is larger than 15
?
In this example, I would like to get 2
since it is the first row that have non-zero A and B column and avg value of 25
which is more than 15
Note that this DataFrame is huge, I am looking for the fastest way to the index value.
Upvotes: 1
Views: 648
Reputation: 1463
I find more readable explicit variables, like:
AB2 = (df['A']+df['B'])/2
filter = (df['A'] != 0) & (df['B'] != 0) & (AB2>15)
your_index = df[filter].index[0]
Performance For this use case (ridiculous dataset)
%%timeit
df[(df.A.ne(0)&df.B.ne(0))&((df.A+df.B)/2).gt(15)].first_valid_index()
**1.21 ms** ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
AB2 = (df['A']+df['B'])/2
filter = (df['A'].ne(0)) & (df['B'].ne(0)) & (AB2>15)
df[filter].index[0]
**1.08 ms** ± 28.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df.query("A!=0 and B!=0 and (A+B)/2 > 15").index[0]
**2.71 ms** ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1
Reputation: 28709
If the dataframe is large, query
might be faster:
df.query("A!=0 and B!=0 and (A+B)/2 > 15").index[0]
2
Upvotes: 0
Reputation: 26676
Lets try:
df[(df.A.ne(0)&df.B.ne(0))&((df.A+df.B)/2).gt(15)].first_valid_index()
Upvotes: 5