Reputation: 884
I have a pandas dataframe which has may be 1000 Columns. However I do not need so many columns> I need columns only if they match/starts/contains specific strings.
So lets say I have a dataframe columns like df.columns =
HYTY, ABNH, CDKL, GHY@UIKI, BYUJI@#hy BYUJI@tt BBNNII#5 FGATAY@J ....
I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*
So what I was trying to do is to create a list of regular expressions like:
import re
relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']
my_w_lst = [re.escape(s) for s in relst]
mask_pattrn = '|'.join(my_w_lst)
Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not. However, not understanding how to get the dataframe of only those true selected columns from this.
Any help will be appreciated.
Upvotes: 0
Views: 2103
Reputation: 323316
We can do startswith
relst = ['CDKL', 'BYUJI', 'BBNI']
subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]
Upvotes: 1
Reputation: 569
Using what you already have you can pass your mask to filter like:
df.filter(regex=mask_pattrn)
Upvotes: 2
Reputation: 5183
Use re.findall()
. It will give you a list of columns to pass to df[mylist]
Upvotes: 1