Finding the index for a value in a Pandas Dataframe

Question

I've got a problem that shouldn't be that difficult but it's stumping me. There has to be an easy way to do it. I have a series from a dataframe that looks like this:

               value

2001-01-04     0.134
2001-01-05      Nan
2001-01-06      Nan
2001-01-07     0.032
2001-01-08      Nan
2001-01-09     0.113
2001-01-10      Nan
2001-01-11      Nan
2001-01-12     0.112
2001-01-13      Nan
2001-01-14      Nan
2001-01-15     0.136
2001-01-16      Nan
2001-01-17      Nan

Iterating from bottom to top, I need to find the index of the value that is greater than 0.100 at the earliest date where the next earliest date would be less than 0.100.

So in the series above, I want to find the index of the value 0.113 which is 2001-01-09. The next earlier value is below 0.100 (0.031 on 2001-01-07). The two later values are greater than 0.100 but I want the index of the earliest value > 0.100 following a value less than than threshold iterating bottom to top.

The only way I can think of doing this is reversing the series, iterating to the first (last) value, checking if it is > 0.100, then again iterating to the next earlier value, and checking it to see if it's less than 0.100. If it isn't I'm done. If it > 0.100 I have to iterate again and test the earlier number.

Surely there is a non-messy way to do this I'm not seeing that avoids all this stepwise iteration.

Thanks in advance for you help.

root · Accepted Answer

You're essentially looking for two conditions. For the first condition, you want the given value to be greater than 0.1:

df['value'].gt(0.1)

For the second condition, you want the previous non-null value to be less than 0.1:

df['value'].ffill().shift().lt(0.1)

Now, combine the two conditions with the and operator, reverse the resulting Boolean indexer, and use idxmax to find the the first (last) instance where your condition holds:

(df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1].idxmax()

Which gives the expected index value.

The above method assumes that at least one value satisfies the situation you've described. If it's possible that your data may not satisfy your situation you may want to use any to verify that a solution exists:

# Build the condition.
cond = (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1]

# Check if the condition is met anywhere.
if cond.any():
    idx = cond.idxmax()
else:
    idx = ???

In you're question, you've specified both inequalities to be strict. What happens for a value exactly equal to 0.1? You may want to change one of the gt/lt to ge/le to account for this.

Finding the index for a value in a Pandas Dataframe

Answers (2)

Related Questions