How to exclude and filter few columns in pandas?

Question

I know we can select only few columns using pandas dataframe filter, but can we also exclude only some columns?

Here is MWE:

import numpy as np
import pandas as pd


df = pd.DataFrame({'id': [1,2,3], 'num_1': [10,20,30], 'num_2': [20,30,40]})

df.filter(regex='num')

Can we select all columns not having 'num' in the columns:

Something like:

df.filter(regex='^(num)')

Required Output

Note

# these already works, i am only looking regex way
df[['id']] # gives the required output

Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html

BhishanPoudel · Accepted Answer

Using contains

df.loc[:,~df.columns.str.contains('num')]
df.loc[:,~df.columns.str.startswith('num')]

Using difference

df[df.columns.difference(['num_1','num_2'])]
df[df.columns.difference([i for i in df.columns if i.startswith('num'])]
df[df.columns.difference([i for i in df.columns if 'num' in i]

Using drop

df[df.columns.drop(['num_1','num_2'])] # do not use axis=1 here.
df[df.columns.drop([i for i in df.columns if i.startswith('num'])]
df[df.columns.drop([i for i in df.columns if 'num' in i]

Credits to @ALollz
If you insist on using filter:

df.filter(regex='^((?!num).)*$')

But this is not only ugly and difficult to read/understand but also vulnerable to some pitfalls of regex search as discussed in Regular expression to match a line that doesn't contain a word .

I would suggest to stick with contains.

How to exclude and filter few columns in pandas?

Answers (2)

Related Questions