rmahesh
rmahesh

Reputation: 749

Axis error when dropping specific columns Pandas

I have identified specific columns I want to select as my predictors for my model based on some analysis. I have captured those column numbers and stored it in a list. I have roughly 80 columns and want to loop through and drop the columns not in this specific list. X_train is the column in which I want to do this. Here is my code:

cols_selected = [24, 4, 7, 50, 2, 60, 46, 53, 48, 61]
cols_drop = []

for x in range(len(X_train.columns)):
    if x in cols_selected:
        pass
    else:
        X_train.drop([x])

When running this, I am faced with the following error while highlighting the code: X_train.drop([x]):

KeyError: '[3] not found in axis'

I am sure it is something very simple that I am missing. I tried including the inplace=True or axis=1 statements along with this and all of them had the same error message (while the value inside the [] changed with those error codes).

Any help would be great!

Edit: Here is the addition to get this working:

cols_selected = [24, 4, 7, 50, 2, 60, 46, 53, 48, 61]
cols_drop = []

for x in range(len(X_train.columns)):
    if x in cols_selected:
        pass
    else:
        cols_drop.append(x)

X_train = X_train.drop(X_train.columns[[cols_drop]], axis=1)    

Upvotes: 0

Views: 21284

Answers (3)

matthiasdenu
matthiasdenu

Reputation: 397

Also, in addition to @pygo pointing out that df.drop takes a keyword arg to designate the axis, try this:

X_train = X_train[[col for col in X_train.columns if col in cols_selected]] 

Here is an example:

>>> import numpy as np
>>> import pandas as pd
>>> cols_selected = ['a', 'c', 'e']
>>> X_train = pd.DataFrame(np.random.randint(low=0, high=10, size=(20, 5)), columns=['a', 'b', 'c', 'd', 'e'])
>>> X_train
    a  b  c  d  e
0   4  0  3  5  9
1   8  8  6  7  2
2   1  0  2  0  2
3   3  8  0  5  9
4   5  9  7  8  0
5   1  9  3  5  9 ...
>>> X_train = X_train[[col for col in X_train.columns if col in cols_selected]]
>>> X_train
    a  c  e
0   4  3  9
1   8  6  2
2   1  2  2
3   3  0  9
4   5  7  0
5   1  3  9 ...

Upvotes: 1

0x51ba
0x51ba

Reputation: 463

According to the documentation of drop:

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names

You can not drop columns by simply using the index of the column. You need the name of the columns. Also the axis parameter has to be set to 1 or columns Replace X_train.drop([x]) with X_train=X_train.drop(X_train.columns[x], axis='columns') to make your example work.

Upvotes: 2

Karn Kumar
Karn Kumar

Reputation: 8816

I am just assuming as per the question litle:

Example DataFrame:

>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Dropping Specific columns B & C:

>>> df.drop(['B', 'C'], axis=1)
# df.drop(['B', 'C'], axis=1, inplace=True) <-- to make the change the df itself , use inplace=True
   A   D
0  0   3
1  4   7
2  8  11

If you are trying to drop them by column numbers (Dropping by index) then try like below:

>>> df.drop(df.columns[[1, 2]], axis=1)
   A   D
0  0   3
1  4   7
2  8  11

OR

>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11

Upvotes: 1

Related Questions