Reputation: 1563
I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far:
ones = []
zeros = []
for year in years:
for i in range(0,599):
if year[str(i)].values.any() == 1:
ones.append(i)
if year[str(i)].values.all() == 0:
zeros.append(i)
for j in ones:
if j in zeros:
zeros.remove(j)
for q in zeros:
del year[str(q)]
In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.
Upvotes: 141
Views: 175790
Reputation: 665
If you want to check if there is at least one column full of zeros in your dataframe, you could use :
(df==0).all().any() # Returns True if a column of the dataframe is made up entirely of zeros.
Note : This answer does not cover the "deleting" part of this particular question, because here you don't retrieve an identifier for the "all zero" column.
But some, like me, might be redirected here from a "closed as duplicate" question : Check if pandas column contains all zeros which seems a bit different as it asks if a column contains only zeros. This answer covers an extension of that particular linked question : check if a dataframe contains at least one column with only zeros.
Upvotes: 1
Reputation: 21
This should do the work:
zero_cols = df.columns[(df == 0).all()]
df.drop(labels=zero_cols, axis=1, inplace=True)
Upvotes: 2
Reputation: 944
In case there are some NaN values in your columns, you may want to use this approach if you want to remove columns that have both 0 and NaN :
df.loc[:, (df**2).sum() != 0]
Upvotes: 3
Reputation: 1863
In case you'd like a more expressive way of getting the zero-column names so you can print / log them, and drop them, in-place, by their names:
zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
df.drop(zero_cols, axis=1, inplace=True)
Some break down:
# a pandas Series with {col: is_zero} items
# is_zero is True when the number of zero items in that column == num_all_rows
(df == 0).sum() == df.shape[0])
# a list comprehension of zero_col_names is built from the_series
[ col for col, is_zero in the_series.items() if is_zero ]
Upvotes: 2
Reputation: 2200
Here is an alternative way to use is
df.replace(0,np.nan).dropna(axis=1,how="all")
Compared with the solution of unutbu, this way is obviously slower:
%timeit df.loc[:, (df != 0).any(axis=0)]
652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 17
Reputation: 879641
df.loc[:, (df != 0).any(axis=0)]
Here is a break-down of how it works:
In [74]: import pandas as pd
In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])
In [76]: df
Out[76]:
0 1 2 3
0 1 0 0 0
1 0 0 1 0
[2 rows x 4 columns]
df != 0
creates a boolean DataFrame which is True where df
is nonzero:
In [77]: df != 0
Out[77]:
0 1 2 3
0 True False False False
1 False False True False
[2 rows x 4 columns]
(df != 0).any(axis=0)
returns a boolean Series indicating which columns have nonzero entries. (The any
operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)
In [78]: (df != 0).any(axis=0)
Out[78]:
0 True
1 False
2 True
3 False
dtype: bool
And df.loc
can be used to select those columns:
In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]:
0 2
0 1 0
1 0 1
[2 rows x 2 columns]
To "delete" the zero-columns, reassign df
:
df = df.loc[:, (df != 0).any(axis=0)]
Upvotes: 327