Reputation: 445
How do I find the index of an element (e.g "True") in a series or a column?
For example I have a column, where I want to identify the first instance where an event occur. So I write it as
Variable = df["Force"] < event
This then creates a boolen series of Data where it is False, until the first instance it becomes True. How then do I find the index of data point?
Is there are better way?
Upvotes: 30
Views: 23019
Reputation: 9772
Here is an all-pandas solution that I consider a little neater than some of the other answers. It is also able to handle the corner case where no value of the input series satisfies the condition.
def first_index_ordered(mask):
assert mask.index.is_monotonic_increasing
assert mask.dtype == bool
idx_min = mask[mask].index.min()
return None if pd.isna(idx_min) else idx_min
col = "foo"
thr = 42
mask = df[col] < thr
idx_first = first_index_ordered(mask)
The above assumed that mask
has a value-ordered, monotonically increasing index. If this is not the case, we have to do a bit more:
def first_index_unordered(mask):
assert mask.dtype == bool
index = mask.index
# This creates a RangeIndex, which is monotonic
mask = mask.reset_index(drop=True)
idx_min = mask[mask].index.min()
return None if pd.isna(idx_min) else index[idx_min]
Of course, we can combine both cases in one function:
def first_index_where(mask):
if mask.index.is_monotonic_increasing:
return first_index_ordered(mask)
else:
return first_index_unordered(mask)
Upvotes: 0
Reputation: 164703
Below is a non-pandas solution which I find easy to adapt:
import pandas as pd
df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
next(idx for idx, x in zip(df.index, df.Force) if x < 3) # d
It works by iterating to the first result of a generator expression.
Pandas appears to perform poorly in comparison:
df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))
n = 99900
%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop
%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop
%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop
Upvotes: 5
Reputation: 7994
You can also try first_valid_index
with where
.
df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3
where
will replace the part that does not meet the condition with np.nan
by default. Then, we find the first valid index out of the series.
Or this: select a subset of the item that you are interested in, here Variable == 1
. Then find the first item in its index.
df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]
Bonus: if you need the index of first appearance of many kinds of items, you can use drop_duplicates
.
df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"], ["blue"], ["red"]], columns=["Force"])
df.Force.drop_duplicates().reset_index()
index Force
0 0 yello
1 2 blue
2 3 red
Some more work...
df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}
Upvotes: 8
Reputation: 294348
Use idxmax
to find the first instance of the maximum value. In this case, True
is the maximum value.
df['Force'].lt(event).idxmax()
Consider the sample df
:
df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df
Force
a 5
b 4
c 3
d 2
e 1
The first instance of Force
being less than 3
is at index 'd'
.
df['Force'].lt(3).idxmax()
'd'
Be aware that if no value for Force
is less than 3, then the maximum will be False
and the first instance will be the first one.
Also consider the alternative argmax
df.Force.lt(3).values.argmax()
3
It returns the position of the first instance of maximal value. You can then use this to find the corresponding index
value:
df.index[df.Force.lt(3).values.argmax()]
'd'
Also, in the future, argmax
will be a Series method.
Upvotes: 36