Reputation: 1850
Take the DataFrame
in the answer of Loc vs. iloc vs. ix vs. at vs. iat? for example.
df = pd.DataFrame(
{'age':[30, 2, 12, 4, 32, 33, 69],
'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
'height':[165, 70, 120, 80, 180, 172, 150],
'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']
)
Now I want all columns except 'food' and 'height'.
I thought something like df.loc[:,['age':'color', 'score':'state']]
would work, but Python returns SyntaxError: invalid syntax
.
I am aware of that there is one way to work around: df.drop(columns = ['food', 'height'])
. However, in my real life situation, I have hundreds of columns to be dropped. Typing out all column names is so inefficient.
I am expecting something similar with dplyr::select(df, -(food:height))
or dplyr::select(df, age:color, score:state)
in R language.
Also have read Selecting/Excluding sets of columns in Pandas.
Upvotes: 3
Views: 1758
Reputation: 28644
One option for flexible column selection is with select_columns from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.select_columns(slice('age', 'color'), slice('score', 'state'))
age color score state
Jane 30 blue 4.6 NY
Nick 2 green 8.3 TX
Aaron 12 red 9.0 FL
Penelope 4 white 3.3 AL
Dean 32 gray 1.8 AK
Christina 33 black 9.5 TX
Cornelia 69 red 2.2 TX
df.select_columns(slice('food', 'height'), invert = True)
age color score state
Jane 30 blue 4.6 NY
Nick 2 green 8.3 TX
Aaron 12 red 9.0 FL
Penelope 4 white 3.3 AL
Dean 32 gray 1.8 AK
Christina 33 black 9.5 TX
Cornelia 69 red 2.2 TX
Upvotes: 0
Reputation: 402523
First, find all columns lying between food
and height
(inclusive).
c = df.iloc[-1:0].loc[:, 'food':'height'].columns
Next, filter with difference
/isin
/setdiff1d
-
df[df.columns.difference(c)]
Or,
df.loc[:, ~df.columns.isin(c)]
Or,
df[np.setdiff1d(df.columns, c)]
age color score state
Jane 30 blue 4.6 NY
Nick 2 green 8.3 TX
Aaron 12 red 9.0 FL
Penelope 4 white 3.3 AL
Dean 32 gray 1.8 AK
Christina 33 black 9.5 TX
Cornelia 69 red 2.2 TX
Upvotes: 1
Reputation: 862681
First get positions of columns names by Index.get_loc
and then use numpy.r_
for join all slicers together:
a = np.r_[df.columns.get_loc('age'):df.columns.get_loc('color')+1,
df.columns.get_loc('score'):df.columns.get_loc('state')+1]
df = df.iloc[:, a]
print (df)
age color score state
Jane 30 blue 4.6 NY
Nick 2 green 8.3 TX
Aaron 12 red 9.0 FL
Penelope 4 white 3.3 AL
Dean 32 gray 1.8 AK
Christina 33 black 9.5 TX
Cornelia 69 red 2.2 TX
Upvotes: 0