Jamil
Jamil

Reputation: 53

Pandas Dataframe : Using the count function to filter data

I have a pandas dataframe from which I want to create a new dataframe by applying a filter based on the count function such that only those columns should be selected whose count is equal to a specified number.

For example in the dataframe below:

month_end   Col A       Col B       Col C       Col D
200703      NaN          NaN         NaN         NaN
200704      0.084       0.152       0.142      0.0766
200705      0.124       0.123       0.020       NaN 
200706      NaN         0.191       0.091       0.149   
200707      -0.136      0.047       0.135      -0.127

If my_variable = 4, then df1 should only contain Col B and Col D alongwith the index month_end.

How do I do this?

Upvotes: 4

Views: 1027

Answers (3)

gtomer
gtomer

Reputation: 6564

Another solution without a loop:

s = df.notna().sum(0) == 4     
df = df.loc[:, s]

Upvotes: 1

yatu
yatu

Reputation: 88236

You could do something along the lines of:

df.loc[:,df.notna().sum(0).eq(4)]

    ColB   ColC
0    NaN    NaN
1  0.152  0.142
2  0.123  0.020
3  0.191  0.091
4  0.047  0.135

Or there's also count, which already drops dupes prior to counting:

df.loc[:,df.count().eq(4)]

If you want to include the date column, and it isn't the index:

ix = df.notna().sum(0).eq(4)
df.loc[:,ix.index[ix].union(['month_end'])]

    ColB   ColC  month_end
0    NaN    NaN     200703
1  0.152  0.142     200704
2  0.123  0.020     200705
3  0.191  0.091     200706
4  0.047  0.135     200707

Upvotes: 5

gtomer
gtomer

Reputation: 6564

A solution with a for loop:

for col in df.columns:
    if (df[col].count() != 4):
        df.drop(col, axis=1)

Upvotes: 0

Related Questions