Reputation: 7404
Here is a sample of some data I am working with.
A B C
2014-01-01 4072.0 9871.0 NaN
2014-02-01 NaN 9948.0 NaN
2014-03-01 NaN 10248.0 NaN
2014-04-01 NaN 9872.0 NaN
2014-05-01 NaN 12438.0 NaN
2014-06-01 NaN 10588.0 NaN
2014-07-01 NaN 8718.0 NaN
2014-08-01 NaN 10378.0 NaN
2014-09-01 NaN 9563.0 NaN
2014-10-01 NaN 10669.0 NaN
2014-11-01 NaN 9843.0 NaN
2014-12-01 NaN 9837.0 NaN
2015-01-01 NaN 8606.0 NaN
2015-02-01 NaN 10458.0 NaN
2015-03-01 NaN 9351.0 NaN
2015-04-01 NaN 8705.0 NaN
2015-05-01 NaN 11887.0 NaN
2015-06-01 NaN 8979.0 NaN
2015-07-01 NaN 8373.0 NaN
2015-08-01 NaN 10206.0 NaN
2015-09-01 NaN 9672.0 NaN
2015-10-01 NaN 10351.0 NaN
2015-11-01 NaN 8482.0 808.0
2015-12-01 NaN 7987.0 7691.0
2016-01-01 NaN 7881.0 8327.0
2016-02-01 NaN 7418.0 8220.0
2016-03-01 NaN 6324.0 9086.0
2016-04-01 NaN 3617.0 8362.0
2016-05-01 NaN 39.0 13298.0
2016-06-01 NaN 0.0 13408.0
2016-07-01 NaN NaN 16140.0
2016-08-01 NaN NaN 14520.0
2016-09-01 NaN NaN 14800.0
2016-10-01 NaN NaN 15407.0
2016-11-01 NaN NaN 15812.0
2016-12-01 NaN NaN 2017.0
Some of the columns (like A and B) have a non-nans in the first couple of rows. Some columns (like C) have non nans in the last couple of rows.
I'm interested in removing columns like C. How can I slice these out?
Upvotes: 2
Views: 132
Reputation: 862761
You can use boolean mask
with first row of data selected by iloc
and notnull
, last select by loc
because select by boolean indexing
columns:
print (df.iloc[0].notnull())
A True
B True
C False
Name: 2014-01-01, dtype: bool
print (df.loc[:, df.iloc[0].notnull()])
A B
2014-01-01 4072.0 9871.0
2014-02-01 NaN 9948.0
2014-03-01 NaN 10248.0
2014-04-01 NaN 9872.0
2014-05-01 NaN 12438.0
2014-06-01 NaN 10588.0
2014-07-01 NaN 8718.0
2014-08-01 NaN 10378.0
2014-09-01 NaN 9563.0
2014-10-01 NaN 10669.0
2014-11-01 NaN 9843.0
2014-12-01 NaN 9837.0
2015-01-01 NaN 8606.0
Another solution:
print (df[df.columns[df.iloc[0].notnull()]])
Timings:
#maxu solution
In [216]: %timeit (df[df.columns[df.notnull().iloc[0]]])
1000 loops, best of 3: 995 µs per loop
In [217]: %timeit (df.loc[:, df.iloc[0].notnull()])
1000 loops, best of 3: 635 µs per loop
In [218]: %timeit (df[df.columns[df.iloc[0].notnull()]])
1000 loops, best of 3: 820 µs per loop
#[360000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)
#maxu solution
In [233]: %timeit (df[df.columns[df.notnull().iloc[0]]])
100 loops, best of 3: 7.07 ms per loop
In [234]: %timeit (df.loc[:, df.iloc[0].notnull()])
100 loops, best of 3: 4.14 ms per loop
In [235]: %timeit (df[df.columns[df.iloc[0].notnull()]])
100 loops, best of 3: 4.3 ms per loop
Upvotes: 2
Reputation: 210852
In [70]: df[df.columns[df.notnull().iloc[0]]].head()
Out[70]:
A B
2014-01-01 4072.0 9871.0
2014-02-01 NaN 9948.0
2014-03-01 NaN 10248.0
2014-04-01 NaN 9872.0
2014-05-01 NaN 12438.0
Explanation:
In [71]: df.notnull().iloc[0]
Out[71]:
A True
B True
C False
Name: 2014-01-01, dtype: bool
In [72]: df.columns[df.notnull().iloc[0]]
Out[72]: Index(['A', 'B'], dtype='object')
Upvotes: 2