Jacob
Jacob

Reputation: 41

Conditionally dropping columns in a pandas dataframe

I have this dataframe and my goal is to remove any columns that have less than 1000 entries.

Prior to to pivoting the df I know I have 880 unique well_id's with entries ranging from 4 to 60k+. I know should end up with 102 well_id's.

I tried to accomplish this in a very naïve way by collecting the wells that I am trying to remove in an array and using a loop but I keep getting a 'TypeError: Level type mismatch' but when I just use del without a for loop it works.

#this works
del df[164301.0]
del df['TB-0071']

# this doesn't work
for id in unwanted_id:
    del df[id]

Any help is appreciated, Thanks.

dataframe

Upvotes: 1

Views: 178

Answers (2)

ThePyGuy
ThePyGuy

Reputation: 18416

You can use pandas drop method:

df.drop(columns=['colName'], inplace=True)

You can actually pass a list of columns names:

unwanted_id = [164301.0, 'TB-0071']

df.drop(columns=unwanted_ids, inplace=True)

Sample:

df[:5]
  from to  freq
0    A  X    20
1    B  Z     9
2    A  Y     2
3    A  Z     5
4    A  X     8

df.drop(columns=['from', 'to'])
   freq
0    20
1     9
2     2
3     5
4     8

And to get those column names with more than 1000 unique values, you can use something like this:

counts = df.nunique()[df.nunique()>1000].to_frame('uCounts').reset_index().rename(columns={'index':'colName'})

counts

  colName  uCounts
0      to        1001
1    freq        1050

Upvotes: 1

johnjohn
johnjohn

Reputation: 892

You can use dropna method:

df.dropna(thresh=[]) #specify [here] how many non-na values you require to keep the row

The advantage of this method is that you don't need to create a list.

Also don't forget to add the usual inplace = True if you want the changes to be made in place.

Upvotes: 2

Related Questions