BML91
BML91

Reputation: 3180

Slicing a pandas dataframe to the first instance of all columns containing a value

I have simple pandas time series dataframe similar to this:

In [69]: df
Out[69]:
              A    B
Date
2015-01-01  NaN  NaN
2015-02-01  1.1  NaN
2015-03-01  NaN  NaN
2015-04-01  1.2  NaN
2015-05-01  1.5  1.2
2015-06-01  1.6  1.9
2015-07-01  1.3  NaN
2015-08-01  1.2  3.0
2015-09-01  1.1  1.1

What is the best way to obtain a data frame from the first point where there is a value in all columns onwards, i.e. to get this output programmatically?

In [71]: df.ix[4:]
Out[71]:
              A    B
Date
2015-05-01  1.5  1.2
2015-06-01  1.6  1.9
2015-07-01  1.3  NaN
2015-08-01  1.2  3.0
2015-09-01  1.1  1.1

Upvotes: 5

Views: 6810

Answers (3)

Alexander
Alexander

Reputation: 109546

You can get the first index value of a subset of non-null data (across the rows, or axis=1). Then use .ix to select all data from this index value forward.

idx = df[df.notnull().all(axis=1)].index[0]
>>> df.ix[idx:]

          A    B
Date            
5/1/15  1.5  1.2
6/1/15  1.6  1.9
7/1/15  1.3  NaN
8/1/15  1.2  3.0
9/1/15  1.1  1.1

Upvotes: 2

EdChum
EdChum

Reputation: 394041

You can use the default behaviour of dropna and use the first index that to slice your df:

In [21]:    
df.loc[df.dropna().index[0]:]

Out[21]:
              A    B
Date                
2015-05-01  1.5  1.2
2015-06-01  1.6  1.9
2015-07-01  1.3  NaN
2015-08-01  1.2  3.0
2015-09-01  1.1  1.1

So this works because it will drop all rows that contain a single NaN so you just use the first index value from this to slice the df:

In [22]:
df.dropna()

Out[22]:
              A    B
Date                
2015-05-01  1.5  1.2
2015-06-01  1.6  1.9
2015-08-01  1.2  3.0
2015-09-01  1.1  1.1

Upvotes: 5

Jianxun Li
Jianxun Li

Reputation: 24742

You can use .first_valid_index() to get the first non-NaN index for a column.

# your data
# ============================
df

              A    B
Date                
2015-01-01  NaN  NaN
2015-02-01  1.1  NaN
2015-03-01  NaN  NaN
2015-04-01  1.2  NaN
2015-05-01  1.5  1.2
2015-06-01  1.6  1.9
2015-07-01  1.3  NaN
2015-08-01  1.2  3.0
2015-09-01  1.1  1.1

# processing
# ================================
# get the first valid index for each column
# and calculate the max
first_valid_loc = df.apply(lambda col: col.first_valid_index()).max()

df.loc[first_valid_loc:]

              A    B
Date                
2015-05-01  1.5  1.2
2015-06-01  1.6  1.9
2015-07-01  1.3  NaN
2015-08-01  1.2  3.0
2015-09-01  1.1  1.1

Upvotes: 11

Related Questions