Arwa
Arwa

Reputation: 166

How to calculate the mean of a specific value in columns in Python?

I'm trying to drop columns that have too many missing values. How can I count the occurrence of some values within columns since the missing values are represented using 99 or 90

here is the code that is supposed to drop columns that exceed the threshold value

threshold = 0.6

data = data[data.columns[[data.column == 90 or data.column == 99].count().mean() < threshold]]

I'm not quite used to using pandas, any suggestions would be helpful

Upvotes: 1

Views: 193

Answers (1)

mozway
mozway

Reputation: 260640

You're almost there. Use apply:

threshold = 0.6
out = data[data.apply(lambda s: s.isin([90, 99])).mean(1).lt(threshold)]

Example input:

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
4  99  99   0  90  99  # to drop
5  99   0   0   0  99
6   0   0  99   0  90
7   0  90  99   0  90  #
8  99  90   0  90   0  #
9   0  99   0   0   0

output:

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
5  99   0   0   0  99
6   0   0  99   0  90
9   0  99   0   0   0

Upvotes: 3

Related Questions