Reputation: 367

Keep only columns in Pandas Dataframe based on multiple conditions

Suppose the following dataframe:

import pandas as pd

data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height of Person': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],
        'Country is': ['US', 'UK', 'GE', 'ET']     
       }
df = pd.DataFrame(data)
display(df)

I would like to specify columns that should remain in the dataframe based on a number of strings that are present in the index.

E.g. Keep those columns whose index contain "Name" or "Country" should result in:

data2 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Country is': ['US', 'UK', 'GE', 'ET']   
       }
df2 = pd.DataFrame(data2)
display(df2)

I tried using

df = df.filter(like=["Name"])

but I am not sure how to apply multiple expressions (strings) at once.

Upvotes: 0

Answers (4)

Metro

Reputation: 1

I usually use .loc and find it clearer to read.

df = df.loc[:, df.columns.str.contains('Name|Country', regex=True)

Upvotes: 0

René

Reputation: 4827

This should work:

col_filter = df.columns.str.contains('Name') + df.columns.str.contains('Country')
df.loc[:,col_filter]

Result:

     Name Country is
0     Jai         US
1  Princi         UK
2  Gaurav         GE
3    Anuj         ET

Upvotes: 0

mozway

Reputation: 260335

If you want to filter by name, you can use filter with a regex:

df.filter(regex='Name|Country')

Upvotes: 2

Alex F

Reputation: 2274

If you're trying to filter just on columns you can do:

df = df[[x for x in df.columns if x in ['Names', 'Country is']]

Upvotes: 0

Keep only columns in Pandas Dataframe based on multiple conditions

Answers (4)

Related Questions