user2587593
user2587593

Reputation: 1563

How do I delete a column that contains only zeros in Pandas?

I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far:

ones = []
zeros = []
for year in years:
    for i in range(0,599):
        if year[str(i)].values.any() == 1:
            ones.append(i)
        if year[str(i)].values.all() == 0:
            zeros.append(i)
    for j in ones:
        if j in zeros:
            zeros.remove(j)
    for q in zeros:
        del year[str(q)]

In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.

Upvotes: 141

Views: 175790

Answers (6)

Arthur
Arthur

Reputation: 665

If you want to check if there is at least one column full of zeros in your dataframe, you could use :

(df==0).all().any() # Returns True if a column of the dataframe is made up entirely of zeros.

Note : This answer does not cover the "deleting" part of this particular question, because here you don't retrieve an identifier for the "all zero" column.

But some, like me, might be redirected here from a "closed as duplicate" question : Check if pandas column contains all zeros which seems a bit different as it asks if a column contains only zeros. This answer covers an extension of that particular linked question : check if a dataframe contains at least one column with only zeros.

Upvotes: 1

Amanda
Amanda

Reputation: 21

This should do the work:

zero_cols = df.columns[(df == 0).all()]
df.drop(labels=zero_cols, axis=1, inplace=True)

Upvotes: 2

cyrilb38
cyrilb38

Reputation: 944

In case there are some NaN values in your columns, you may want to use this approach if you want to remove columns that have both 0 and NaN :

df.loc[:, (df**2).sum() != 0]

Upvotes: 3

mork
mork

Reputation: 1863

In case you'd like a more expressive way of getting the zero-column names so you can print / log them, and drop them, in-place, by their names:

zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
df.drop(zero_cols, axis=1, inplace=True)

Some break down:

# a pandas Series with {col: is_zero} items
# is_zero is True when the number of zero items in that column == num_all_rows
(df == 0).sum() == df.shape[0])

# a list comprehension of zero_col_names is built from the_series
[ col for col, is_zero in the_series.items() if is_zero ]

Upvotes: 2

Jeremy Z
Jeremy Z

Reputation: 2200

Here is an alternative way to use is

df.replace(0,np.nan).dropna(axis=1,how="all")

Compared with the solution of unutbu, this way is obviously slower:

%timeit df.loc[:, (df != 0).any(axis=0)]
652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 17

unutbu
unutbu

Reputation: 879641

df.loc[:, (df != 0).any(axis=0)]

Here is a break-down of how it works:

In [74]: import pandas as pd

In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])

In [76]: df
Out[76]: 
   0  1  2  3
0  1  0  0  0
1  0  0  1  0

[2 rows x 4 columns]

df != 0 creates a boolean DataFrame which is True where df is nonzero:

In [77]: df != 0
Out[77]: 
       0      1      2      3
0   True  False  False  False
1  False  False   True  False

[2 rows x 4 columns]

(df != 0).any(axis=0) returns a boolean Series indicating which columns have nonzero entries. (The any operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)

In [78]: (df != 0).any(axis=0)
Out[78]: 
0     True
1    False
2     True
3    False
dtype: bool

And df.loc can be used to select those columns:

In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]: 
   0  2
0  1  0
1  0  1

[2 rows x 2 columns]

To "delete" the zero-columns, reassign df:

df = df.loc[:, (df != 0).any(axis=0)]

Upvotes: 327

Related Questions