Reputation: 15
I am having data which contain 1081 columns, some of the columns having 99% zero values, I want to separate those column and store into new data frame, which column having 99% zero values. I am only able to do write this much, can anyone help me to write the code.
for cl,ro in df.iteritems():
n_zeros = (ro == 0).sum()
percent_zero = n_zeros / len(df) * 100
Upvotes: 1
Views: 106
Reputation: 261860
You could use:
df.loc[:, df.eq(0).sum().div(df.shape[0]).gt(0.99)]
example (here with 95% theshold):
np.random.seed(0)
a = np.random.choice([0,1],size=(100, 10),p=[0.95,0.05])
df = pd.DataFrame(a)
mask = df.eq(0).sum().div(df.shape[0]).gt(0.95)
out = df.loc[:, mask] # use out = df.loc[:, ~mask] to drop the columns instead
output:
1 3 6 7
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
.. .. .. .. ..
95 1 0 0 0
96 0 0 0 0
97 0 0 0 0
98 0 0 0 0
99 0 0 0 0
[100 rows x 4 columns]
Upvotes: 1