Dov
Dov

Reputation: 326

Filter out rows/columns with zero values in MultiIndex dataframe

I have the following panda MultiIndex dataframe in python

             0         1         2         3 
bar one  0.000000 -0.929631  0.688818 -1.264180
    two  1.130977  0.063277  0.161366  0.598538
baz one  1.420532  0.052530 -0.701400  0.678847
    two -1.197097  0.314381  0.269551  1.115699
foo one -0.077463  0.437145 -0.202377  0.260864
    two -0.815926 -0.508988 -1.238619  0.899013
qux one -0.347863 -0.999990 -1.428958 -1.488556
    two  1.218567 -0.593987  0.099003  0.800736

My questions, how can I filter out:

  1. Columns that contains zero values -- column 0, in the above example.
  2. With regrade to rows filtering. How can I filter rows with zeros: (bar, one) alone and how can I filter both (bar, one) and (bar, two)?

    (Apologies for my not native English ;)

Upvotes: 4

Views: 21782

Answers (1)

Julien Spronck
Julien Spronck

Reputation: 15433

To filter out columns that contain zero values, you can use

df2 = df.loc[:, (df != 0).all(axis=0)]

To filter out rows that contain zero values, you can use

df2 = df.loc[(df != 0).all(axis=1), :]

To filter out rows, you can use

df2 = df.drop('bar') ## drops both 'bar one' and 'bar two'
df2 = df.drop(('baz', 'two')) ## drops only 'baz two'

For example,

import numpy as np
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df.ix['bar','one'][2] = 0
df = df.loc[:, (df != 0).all(axis=0)]
df = df.drop('bar')
df = df.drop(('baz', 'two'))

#                 0         1         3
# baz one  0.686969  0.410614  0.841630
# foo one  1.522938  0.555734 -1.585507
#     two -0.975976  0.522571 -0.041386
# qux one -0.991787  0.154645  0.179536
#     two -0.725685  0.809784  0.394708

Another way if you have no NaN values in your dataframe is to transform your 0s into NaN and drop the columns or the rows that have NaN:

df[df != 0.].dropna(axis=1) # to remove the columns with 0
df[df != 0.].dropna(axis=0) # to remove the rows with 0

Finally, if you want to drop the whole 'bar' row if there is one zero value, you can do this:

indices = df.loc[(df == 0).any(axis=1), :].index.tolist() ## multi-index values that contain 0
for ind in indices:
    df = df.drop(ind[0])

Upvotes: 11

Related Questions