Reputation: 1163
I have a dataframe with several columns.
I want to get the first row index for which the values oftwo columns are both not NaN.
I know I need to use df.first_valid_index()
example: if my dataframe is:
Date A B
0 2015-01-02 NaN 1
1 2015-01-02 NaN 2
2 2015-01-02 NaN 3
3 2015-01-02 1 NaN
5 2015-01-02 NaN 2
7 2015-01-03 1 1
I would get 7
Upvotes: 1
Views: 351
Reputation: 221614
One approach -
(~(pd.isnull(df.A) | pd.isnull(df.B))).idxmax()
Similar one -
(~pd.isnull(df[['A','B']]).any(1)).idxmax()
For performance, we might want to use argmax
-
(~pd.isnull(df[['A','B']]).any(1)).argmax()
Thinking purely on performance, we could bring in more of NumPy -
df.index[(~(np.isnan(df.A.values) | np.isnan(df.B.values))).argmax()]
Sample run -
In [172]: df
Out[172]:
Date A B
0 2015-01-02 NaN 1.0
1 2015-01-02 NaN 2.0
2 2015-01-02 NaN 3.0
3 2015-01-02 1.0 NaN
5 2015-01-02 NaN 2.0
7 2015-01-03 1.0 1.0
In [173]: (~(pd.isnull(df.A) | pd.isnull(df.B))).idxmax()
Out[173]: 7
In [180]: (~pd.isnull(df[['A','B']]).any(1)).idxmax()
Out[180]: 7
In [182]: (~pd.isnull(df[['A','B']]).any(1)).argmax()
Out[182]: 7
In [258]: df.index[(~(np.isnan(df.A.values) | np.isnan(df.B.values))).argmax()]
Out[258]: 7
Runtime test -
In [259]: a = np.random.rand(100000,2)
In [260]: a[np.random.rand(*a.shape)>0.2] = np.nan
In [261]: df = pd.DataFrame(a, columns=list(('A','B')))
# @jezrael's soln
In [262]: %timeit df[['A','B']].notnull().all(axis=1).idxmax()
100 loops, best of 3: 4.91 ms per loop
In [263]: %timeit (~(pd.isnull(df.A) | pd.isnull(df.B))).idxmax()
...: %timeit (~pd.isnull(df[['A','B']]).any(1)).idxmax()
...: %timeit (~pd.isnull(df[['A','B']]).any(1)).argmax()
...:
1000 loops, best of 3: 1.37 ms per loop
100 loops, best of 3: 4.73 ms per loop
100 loops, best of 3: 4.74 ms per loop
In [264]: %timeit df.index[(~(np.isnan(df.A.values) | np.isnan(df.B.values))).argmax()]
10000 loops, best of 3: 169 µs per loop
Upvotes: 1
Reputation: 863166
Use notnull
with all
for check all True
values per row and idxmax
:
print (df[['A','B']].notnull())
A B
0 False True
1 False True
2 False True
3 True False
5 False True
7 True True
print (df[['A','B']].notnull().all(axis=1))
0 False
1 False
2 False
3 False
5 False
7 True
dtype: bool
val = df[['A','B']].notnull().all(axis=1).idxmax()
print (val)
7
Upvotes: 1