Windstorm1981
Windstorm1981

Reputation: 2680

Finding the index for a value in a Pandas Dataframe

I've got a problem that shouldn't be that difficult but it's stumping me. There has to be an easy way to do it. I have a series from a dataframe that looks like this:

               value

2001-01-04     0.134
2001-01-05      Nan
2001-01-06      Nan
2001-01-07     0.032
2001-01-08      Nan
2001-01-09     0.113
2001-01-10      Nan
2001-01-11      Nan
2001-01-12     0.112
2001-01-13      Nan
2001-01-14      Nan
2001-01-15     0.136
2001-01-16      Nan
2001-01-17      Nan

Iterating from bottom to top, I need to find the index of the value that is greater than 0.100 at the earliest date where the next earliest date would be less than 0.100.

So in the series above, I want to find the index of the value 0.113 which is 2001-01-09. The next earlier value is below 0.100 (0.031 on 2001-01-07). The two later values are greater than 0.100 but I want the index of the earliest value > 0.100 following a value less than than threshold iterating bottom to top.

The only way I can think of doing this is reversing the series, iterating to the first (last) value, checking if it is > 0.100, then again iterating to the next earlier value, and checking it to see if it's less than 0.100. If it isn't I'm done. If it > 0.100 I have to iterate again and test the earlier number.

Surely there is a non-messy way to do this I'm not seeing that avoids all this stepwise iteration.

Thanks in advance for you help.

Upvotes: 6

Views: 1021

Answers (2)

piRSquared
piRSquared

Reputation: 294526

Bookkeepping

# making sure `nan` are actually `nan`
df.value = pd.to_numeric(df.value, 'coerce')
# making sure strings are actually dates
df.index = pd.to_datetime(df.index)

plan

  • dropna
  • sort_index
  • boolean series of less than 0.1
  • convert to integers to use in diff
  • diff - Your scenario happens when we go from < .1 to > .1. In this case, diff will be -1
  • idxmax - find the first -1

df.value.dropna().sort_index().lt(.1).astype(int).diff().eq(-1).idxmax()

2001-01-09 00:00:00

Correction do account for flaw pointed out by @root.

diffs = df.value.dropna().sort_index().lt(.1).astype(int).diff().eq(-1)
diffs.idxmax() if diffs.any() else pd.NaT

editorial

This question highlights an important SO dynamic. We that answer questions often do so by editing our questions until they are in a satisfactory state. I have observed that those of us who answer pandas questions are generally very helpful to each other as well to those who ask questions.

In this post, I was well informed by @root and subsequently changed my post to reflect the added information. That alone makes @root's post very useful in addition to the other great information they provided.

Please recognize both posts and up vote as many useful posts as you can.

Thx

Upvotes: 4

root
root

Reputation: 33843

You're essentially looking for two conditions. For the first condition, you want the given value to be greater than 0.1:

df['value'].gt(0.1)

For the second condition, you want the previous non-null value to be less than 0.1:

df['value'].ffill().shift().lt(0.1)

Now, combine the two conditions with the and operator, reverse the resulting Boolean indexer, and use idxmax to find the the first (last) instance where your condition holds:

(df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1].idxmax()

Which gives the expected index value.

The above method assumes that at least one value satisfies the situation you've described. If it's possible that your data may not satisfy your situation you may want to use any to verify that a solution exists:

# Build the condition.
cond = (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1]

# Check if the condition is met anywhere.
if cond.any():
    idx = cond.idxmax()
else:
    idx = ???

In you're question, you've specified both inequalities to be strict. What happens for a value exactly equal to 0.1? You may want to change one of the gt/lt to ge/le to account for this.

Upvotes: 7

Related Questions