SModi
SModi

Reputation: 125

Select columns with one of the strings in a list in their name?

I have a dataframe with columns that follow certain naming convention. I want to keep only those that have 'out' and 'agg' as prefixes in the header.

I've drafted the following code to achieve this. I created a list so that I can make this a small function and call it for any combination of col prefixes that I want to extract.

prefix = ['out', 'agg']
cols = []
for pref in prefix:
    cols = cols + [col for col in df.columns if pref in col]
df = df[cols].dropna(how='all', axis=0)

Is there a shorter/faster way to do this? I liked the solutions here:Drop columns whose name contains a specific string from pandas DataFrame but couldn't make them work for a list of strings.

thanks

Upvotes: 1

Views: 1320

Answers (1)

jezrael
jezrael

Reputation: 862481

Use DataFrame.filter with regex for match columns names by strings joined by | for regex or:

df = pd.DataFrame({
        'A_out':list('abcdef'),
         'B_out':[4,5,4,5,5,4],
         'C_agg':[7,8,9,4,2,3],
         'agg_D':[1,3,5,7,1,0],
         'out_E':[5,3,6,9,2,4],
         'F_agg':list('aaabbb')
})

prefix = ['out', 'agg']

If need match values for any positions in columns names:

df0 = df.filter(regex='|'.join(prefix)).dropna(how='all')
print (df0)
  A_out  B_out  C_agg  agg_D  out_E F_agg
0     a      4      7      1      5     a
1     b      5      8      3      3     a
2     c      4      9      5      6     a
3     d      5      4      7      9     b
4     e      5      2      1      2     b
5     f      4      3      0      4     b

If need only suffixes add $ for match end of strings:

df1 = df.filter(regex='|'.join(f'{x}$' for x in prefix)).dropna(how='all')
print (df1)
  A_out  B_out  C_agg F_agg
0     a      4      7     a
1     b      5      8     a
2     c      4      9     a
3     d      5      4     b
4     e      5      2     b
5     f      4      3     b

If need only prefixes add ^ for match start of strings:

df2 = df.filter(regex='|'.join(f'^{x}' for x in prefix)).dropna(how='all')
print (df2)
   agg_D  out_E
0      1      5
1      3      3
2      5      6
3      7      9
4      1      2
5      0      4

Upvotes: 2

Related Questions