Sofyan Sahrom
Sofyan Sahrom

Reputation: 445

Finding the index of the first element (e.g "True") from a series/column

How do I find the index of an element (e.g "True") in a series or a column?

For example I have a column, where I want to identify the first instance where an event occur. So I write it as

Variable = df["Force"] < event

This then creates a boolen series of Data where it is False, until the first instance it becomes True. How then do I find the index of data point?

Is there are better way?

Upvotes: 30

Views: 23019

Answers (4)

normanius
normanius

Reputation: 9772

Here is an all-pandas solution that I consider a little neater than some of the other answers. It is also able to handle the corner case where no value of the input series satisfies the condition.

def first_index_ordered(mask):
    assert mask.index.is_monotonic_increasing
    assert mask.dtype == bool
    idx_min = mask[mask].index.min()
    return None if pd.isna(idx_min) else idx_min

col = "foo"
thr = 42
mask = df[col] < thr
idx_first = first_index_ordered(mask)

The above assumed that mask has a value-ordered, monotonically increasing index. If this is not the case, we have to do a bit more:

def first_index_unordered(mask):
    assert mask.dtype == bool
    index = mask.index
    # This creates a RangeIndex, which is monotonic
    mask = mask.reset_index(drop=True)
    idx_min = mask[mask].index.min()
    return None if pd.isna(idx_min) else index[idx_min] 

Of course, we can combine both cases in one function:

def first_index_where(mask):
    if mask.index.is_monotonic_increasing:
        return first_index_ordered(mask)
    else:
        return first_index_unordered(mask)

Upvotes: 0

jpp
jpp

Reputation: 164703

Below is a non-pandas solution which I find easy to adapt:

import pandas as pd

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))

next(idx for idx, x in zip(df.index, df.Force) if x < 3)  # d

It works by iterating to the first result of a generator expression.

Pandas appears to perform poorly in comparison:

df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))

n = 99900

%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop

%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop

%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop

Upvotes: 5

Tai
Tai

Reputation: 7994

You can also try first_valid_index with where.

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3

where will replace the part that does not meet the condition with np.nan by default. Then, we find the first valid index out of the series.


Or this: select a subset of the item that you are interested in, here Variable == 1. Then find the first item in its index.

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]

Bonus: if you need the index of first appearance of many kinds of items, you can use drop_duplicates.

df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"],  ["blue"], ["red"]], columns=["Force"])  
df.Force.drop_duplicates().reset_index()
    index   Force
0   0       yello
1   2       blue
2   3       red

Some more work...

df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}

Upvotes: 8

piRSquared
piRSquared

Reputation: 294348

Use idxmax to find the first instance of the maximum value. In this case, True is the maximum value.

df['Force'].lt(event).idxmax()

Consider the sample df:

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df

   Force
a      5
b      4
c      3
d      2
e      1

The first instance of Force being less than 3 is at index 'd'.

df['Force'].lt(3).idxmax()
'd'

Be aware that if no value for Force is less than 3, then the maximum will be False and the first instance will be the first one.

Also consider the alternative argmax

df.Force.lt(3).values.argmax()
3

It returns the position of the first instance of maximal value. You can then use this to find the corresponding index value:

df.index[df.Force.lt(3).values.argmax()]
'd'

Also, in the future, argmax will be a Series method.

Upvotes: 36

Related Questions