Reputation: 53
I have a pandas dataframe from which I want to create a new dataframe by applying a filter based on the count function such that only those columns should be selected whose count is equal to a specified number.
For example in the dataframe below:
month_end Col A Col B Col C Col D
200703 NaN NaN NaN NaN
200704 0.084 0.152 0.142 0.0766
200705 0.124 0.123 0.020 NaN
200706 NaN 0.191 0.091 0.149
200707 -0.136 0.047 0.135 -0.127
If my_variable = 4, then df1 should only contain Col B and Col D alongwith the index month_end.
How do I do this?
Upvotes: 4
Views: 1027
Reputation: 6564
Another solution without a loop:
s = df.notna().sum(0) == 4
df = df.loc[:, s]
Upvotes: 1
Reputation: 88236
You could do something along the lines of:
df.loc[:,df.notna().sum(0).eq(4)]
ColB ColC
0 NaN NaN
1 0.152 0.142
2 0.123 0.020
3 0.191 0.091
4 0.047 0.135
Or there's also count
, which already drops dupes prior to counting:
df.loc[:,df.count().eq(4)]
If you want to include the date column, and it isn't the index:
ix = df.notna().sum(0).eq(4)
df.loc[:,ix.index[ix].union(['month_end'])]
ColB ColC month_end
0 NaN NaN 200703
1 0.152 0.142 200704
2 0.123 0.020 200705
3 0.191 0.091 200706
4 0.047 0.135 200707
Upvotes: 5
Reputation: 6564
A solution with a for loop:
for col in df.columns:
if (df[col].count() != 4):
df.drop(col, axis=1)
Upvotes: 0