BhishanPoudel
BhishanPoudel

Reputation: 17164

How to exclude and filter few columns in pandas?

I know we can select only few columns using pandas dataframe filter, but can we also exclude only some columns?

Here is MWE:

import numpy as np
import pandas as pd


df = pd.DataFrame({'id': [1,2,3], 'num_1': [10,20,30], 'num_2': [20,30,40]})

df.filter(regex='num')

Can we select all columns not having 'num' in the columns:

Something like:

df.filter(regex='^(num)')

Required Output

   id
0   1
1   2
2   3

Note

# these already works, i am only looking regex way
df[['id']] # gives the required output

Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html

Upvotes: 4

Views: 8556

Answers (2)

BhishanPoudel
BhishanPoudel

Reputation: 17164

Using contains

df.loc[:,~df.columns.str.contains('num')]
df.loc[:,~df.columns.str.startswith('num')]

Using difference

df[df.columns.difference(['num_1','num_2'])]
df[df.columns.difference([i for i in df.columns if i.startswith('num'])]
df[df.columns.difference([i for i in df.columns if 'num' in i]

Using drop

df[df.columns.drop(['num_1','num_2'])] # do not use axis=1 here.
df[df.columns.drop([i for i in df.columns if i.startswith('num'])]
df[df.columns.drop([i for i in df.columns if 'num' in i]

Credits to @ALollz
If you insist on using filter:

df.filter(regex='^((?!num).)*$')

But this is not only ugly and difficult to read/understand but also vulnerable to some pitfalls of regex search as discussed in Regular expression to match a line that doesn't contain a word .

I would suggest to stick with contains.

Upvotes: 6

Adarsh Chavakula
Adarsh Chavakula

Reputation: 1599

You can get a list of columns not having num using list comprehension

cols_without_num = [x for x in list(df) if 'num' not in x]

Then subset the data

df[cols_without_num]

Upvotes: 4

Related Questions