Reputation: 17164
I know we can select only few columns using pandas dataframe filter, but can we also exclude only some columns?
Here is MWE:
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1,2,3], 'num_1': [10,20,30], 'num_2': [20,30,40]})
df.filter(regex='num')
Can we select all columns not having 'num' in the columns:
Something like:
df.filter(regex='^(num)')
Required Output
id
0 1
1 2
2 3
Note
# these already works, i am only looking regex way
df[['id']] # gives the required output
Reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html
Upvotes: 4
Views: 8556
Reputation: 17164
Using contains
df.loc[:,~df.columns.str.contains('num')]
df.loc[:,~df.columns.str.startswith('num')]
Using difference
df[df.columns.difference(['num_1','num_2'])]
df[df.columns.difference([i for i in df.columns if i.startswith('num'])]
df[df.columns.difference([i for i in df.columns if 'num' in i]
Using drop
df[df.columns.drop(['num_1','num_2'])] # do not use axis=1 here.
df[df.columns.drop([i for i in df.columns if i.startswith('num'])]
df[df.columns.drop([i for i in df.columns if 'num' in i]
Credits to @ALollz
If you insist on using filter
:
df.filter(regex='^((?!num).)*$')
But this is not only ugly and difficult to read/understand but also vulnerable to some pitfalls of regex search as discussed in Regular expression to match a line that doesn't contain a word .
I would suggest to stick with contains
.
Upvotes: 6
Reputation: 1599
You can get a list of columns not having num
using list comprehension
cols_without_num = [x for x in list(df) if 'num' not in x]
Then subset the data
df[cols_without_num]
Upvotes: 4