Gaurav Bansal
Gaurav Bansal

Reputation: 5660

Select pandas dataframe columns based on which names contain strings in list

I have a dataframe, df, and a list of strings, cols_needed, which indicate the columns I want to retain in df. The column names in df do not exactly match the strings in cols_needed, so I cannot directly use something like intersection. But the column names do contain the strings in cols_needed. I tried playing around with str.contains but couldn't get it to work. How can I subset df based on cols_needed?

import pandas as pd
df = pd.DataFrame({
    'sim-prod1': [1,2],
    'sim-prod2': [3,4],
    'sim-prod3': [5,6],
    'sim_prod4': [7,8]
})

cols_needed = ['prod1', 'prod2']

# What I want to obtain:
    sim-prod1  sim-prod2
0      1        3
1      2        4

Upvotes: 3

Views: 2101

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

A list comprehension could work as well:

columns = [cols for cols in df 
           for col in cols_needed 
           if col in cols]

['sim-prod1', 'sim-prod2']

In [110]: df.loc[:, columns]
Out[110]: 
   sim-prod1  sim-prod2
0          1          3
1          2          4

Upvotes: 3

ALollz
ALollz

Reputation: 59549

With the regex option of filter

df.filter(regex='|'.join(cols_needed))

   sim-prod1  sim-prod2
0          1          3
1          2          4

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150745

You can explore str.contains with a joint pattern, for example:

df.loc[:,df.columns.str.contains('|'.join(cols_needed))]

Output:

   sim-prod1  sim-prod2
0          1          3
1          2          4

Upvotes: 3

Related Questions