Reputation: 7166
How can I remove leading NaN's in pandas?
pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
I want to remove only the first 3 NaN's from above, so the result should be:
pd.Series([1, 2, np.nan, 3])
Upvotes: 20
Views: 5782
Reputation: 393933
Here is another method using pandas methods only:
In [103]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
first_valid = s[s.notnull()].index[0]
s.iloc[first_valid:]
Out[103]:
3 1
4 2
5 NaN
6 3
dtype: float64
So we filter the series using notnull
to get the first valid index. Then use iloc
to slice the series
EDIT
As @ajcr has pointed out it is better to use the built-in method first_valid_index
as this does not return a temp series which I'm using to mask in the above answer, additionally using loc
uses the index label rather than iloc
which uses ordinal position which works for the general case where the index is not an int64Index:
In [104]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
s.loc[s.first_valid_index():]
Out[104]:
3 1
4 2
5 NaN
6 3
dtype: float64
Upvotes: 18
Reputation: 221514
Two more approaches could be suggested here, assuming A
as the input series.
Approach #1: With slicing -
A[np.where(~np.isnan(A))[0][0]:]
Approach #2: With masking -
A[np.maximum.accumulate(~np.isnan(A))]
Sample run -
In [219]: A = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
In [220]: A
Out[220]:
0 NaN
1 NaN
2 NaN
3 1
4 2
5 NaN
6 3
dtype: float64
In [221]: A[np.where(~np.isnan(A))[0][0]:] # Approach 1
Out[221]:
3 1
4 2
5 NaN
6 3
dtype: float64
In [222]: A[np.maximum.accumulate(~np.isnan(A))] # Approach 2
Out[222]:
3 1
4 2
5 NaN
6 3
dtype: float64
Upvotes: 1
Reputation: 1731
To remove the leading np.nan
:
tab = [np.nan, np.nan, np.nan, 1, 2, np.nan, 3]
pd.Series(tab[tab.index([n for n in tab if np.isnan(n)].pop(0)):])
Upvotes: -1
Reputation: 55448
Find first non-nan index
To find the index of the first non-nan item
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
nans = s.apply(np.isnan)
first_non_nan = nans[nans == False].index[0] # get the first one
Output
s[first_non_nan:]
Out[44]:
3 1
4 2
5 NaN
6 3
dtype: float64
Upvotes: 2