Stan
Stan

Reputation: 884

Filtering Pandas Dataframe Based on List of Column Names

I have a pandas dataframe which has may be 1000 Columns. However I do not need so many columns> I need columns only if they match/starts/contains specific strings.

So lets say I have a dataframe columns like df.columns =

  HYTY, ABNH, CDKL, GHY@UIKI,  BYUJI@#hy  BYUJI@tt  BBNNII#5  FGATAY@J ....

I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*

So what I was trying to do is to create a list of regular expressions like:

  import re 

  relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']


  my_w_lst = [re.escape(s) for s in relst]

  mask_pattrn = '|'.join(my_w_lst)

Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not. However, not understanding how to get the dataframe of only those true selected columns from this.

Any help will be appreciated.

Upvotes: 0

Views: 2103

Answers (3)

BENY
BENY

Reputation: 323316

We can do startswith

relst = ['CDKL', 'BYUJI', 'BBNI']

subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]

Upvotes: 1

Jack
Jack

Reputation: 569

Using what you already have you can pass your mask to filter like:

df.filter(regex=mask_pattrn)

Upvotes: 2

RichieV
RichieV

Reputation: 5183

Use re.findall(). It will give you a list of columns to pass to df[mylist]

Upvotes: 1

Related Questions