Reputation: 429

pandas:drop columns which its missing rate over 90%

How to combine this line into pandas dataframe to drop columns which its missing rate over 90%?

this line will show all the column and its missing rate:

percentage = (LoanStats_securev1_2018Q1.isnull().sum()/LoanStats_securev1_2018Q1.isnull().count()*100).sort_values(ascending = False)

Someone familiar with pandas please kindly help.

Upvotes: 1

Answers (2)

Sreekiran A R

Reputation: 3421

You can use dropna with a threshold

    newdf=df.dropna(axis=1,thresh=len(df)*0.9)

axis=1 indicates column and thresh is the minimum number of non-NA values required.

Upvotes: 4

jezrael

Reputation: 863031

I think need boolean indexing with mean of boolean mask:

df = df.loc[:, df.isnull().mean() < .9]

Sample:

np.random.seed(2018)

df = pd.DataFrame(np.random.randn(20,3), columns=list('ABC'))
df.iloc[3:8,0] = np.nan
df.iloc[:-1,1] = np.nan
df.iloc[1:,2] = np.nan
print (df)
           A         B         C
0  -0.276768       NaN  2.148399
1  -1.279487       NaN       NaN
2  -0.142790       NaN       NaN
3        NaN       NaN       NaN
4        NaN       NaN       NaN
5        NaN       NaN       NaN
6        NaN       NaN       NaN
7        NaN       NaN       NaN
8  -0.172797       NaN       NaN
9  -1.604543       NaN       NaN
10 -0.276501       NaN       NaN
11  0.704780       NaN       NaN
12  0.138125       NaN       NaN
13  1.072796       NaN       NaN
14 -0.803375       NaN       NaN
15  0.047084       NaN       NaN
16 -0.013434       NaN       NaN
17 -1.580231       NaN       NaN
18 -0.851835       NaN       NaN
19 -0.148534  0.133759       NaN

print(df.isnull().mean())
A    0.25
B    0.95
C    0.95
dtype: float64

df = df.loc[:, df.isnull().mean() < .9]
print (df)
           A
0  -0.276768
1  -1.279487
2  -0.142790
3        NaN
4        NaN
5        NaN
6        NaN
7        NaN
8  -0.172797
9  -1.604543
10 -0.276501
11  0.704780
12  0.138125
13  1.072796
14 -0.803375
15  0.047084
16 -0.013434
17 -1.580231
18 -0.851835
19 -0.148534

Upvotes: 1

pandas:drop columns which its missing rate over 90%

Answers (2)

Related Questions