Reputation: 891
I have a large pandas dataframe (>100 columns). I need to drop various sets of columns and i'm hoping there is a way of using the old
df.drop(df.columns['slices'],axis=1)
I've built selections such as:
a = df.columns[3:23]
b = df.colums[-6:]
as a
and b
represent column sets I want to drop.
The following
list(df)[3:23]+list(df)[-6:]
yields the correct selection, but i can't implement it with a drop
:
df.drop(df.columns[list(df)[3:23]+list(df)[-6:]],axis=1)
ValueError: operands could not be broadcast together with shapes (20,) (6,)
I looked around but can't get my answer.
Selecting last n columns and excluding last n columns in dataframe
(Below pertains to the error I receive):
python numpy ValueError: operands could not be broadcast together with shapes
This one feels like they're having a similar issue, but the 'slices' aren't separate: Deleting multiple columns based on column names in Pandas
Cheers
Upvotes: 13
Views: 31177
Reputation: 8418
You can use this simple solution:
cols = [3,7,10,12,14,16,18,20,22]
df.drop(df.columns[cols],axis=1,inplace=True)
the result :
0 1 2 4 5 6 8 9 11 13 15 17 19 21
0 3 12 10 3 2 1 7 512 64 1024.0 -1.0 -1.0 -1.0 -1.0
1 5 12 10 3 2 1 7 16 2 32.0 32.0 1024.0 -1.0 -1.0
2 5 12 10 3 2 1 7 512 2 32.0 32.0 32.0 -1.0 -1.0
3 5 12 10 3 2 1 7 16 1 32.0 64.0 1024.0 -1.0 -1.0
As you can see the columns with given index have been all deleted.
You can replace the int value by the name of the column you have in your array if we suppose you have A,B,C ...etc you can replace int values in cols
like this for example :
cols = ['A','B','C','F']
Upvotes: 7
Reputation: 141
This returns the dataframe with the columns removed
df.drop(list(df)[2:5], axis=1)
Upvotes: 11
Reputation: 164803
You can use np.r_
to seamlessly combine multiple ranges / slices:
from string import ascii_uppercase
df = pd.DataFrame(columns=list(ascii_uppercase))
idx = np.r_[3:10, -5:0]
print(idx)
array([ 3, 4, 5, 6, 7, 8, 9, -5, -4, -3, -2, -1])
You can then use idx
to index your columns and feed to pd.DataFrame.drop
:
df.drop(df.columns[idx], axis=1, inplace=True)
print(df.columns)
Index(['A', 'B', 'C', 'K', 'L', 'M', 'N',
'O','P', 'Q', 'R', 'S', 'T', 'U'], dtype='object')
Upvotes: 8
Reputation: 314
I have run into a similar issue before and had trouble with this but fixed it by "subtracting" one df from the other, not sure if this will work for you but it did for me:
df = df[~df.index.isin(a.index)]
df = df[~df.index.isin(b.index)]
Upvotes: 1
Reputation: 7848
IIUC:
a = df.columns[3:23].values.tolist()
b = df.colums[-6:].values.tolist()
a.extend(b)
df.drop(a,1,inplace=True)
Upvotes: 2