Reputation: 749
I have identified specific columns I want to select as my predictors for my model based on some analysis. I have captured those column numbers and stored it in a list. I have roughly 80 columns and want to loop through and drop the columns not in this specific list. X_train is the column in which I want to do this. Here is my code:
cols_selected = [24, 4, 7, 50, 2, 60, 46, 53, 48, 61]
cols_drop = []
for x in range(len(X_train.columns)):
if x in cols_selected:
pass
else:
X_train.drop([x])
When running this, I am faced with the following error while highlighting the code: X_train.drop([x]):
KeyError: '[3] not found in axis'
I am sure it is something very simple that I am missing. I tried including the inplace=True or axis=1 statements along with this and all of them had the same error message (while the value inside the [] changed with those error codes).
Any help would be great!
Edit: Here is the addition to get this working:
cols_selected = [24, 4, 7, 50, 2, 60, 46, 53, 48, 61]
cols_drop = []
for x in range(len(X_train.columns)):
if x in cols_selected:
pass
else:
cols_drop.append(x)
X_train = X_train.drop(X_train.columns[[cols_drop]], axis=1)
Upvotes: 0
Views: 21284
Reputation: 397
Also, in addition to @pygo pointing out that df.drop takes a keyword arg to designate the axis, try this:
X_train = X_train[[col for col in X_train.columns if col in cols_selected]]
Here is an example:
>>> import numpy as np
>>> import pandas as pd
>>> cols_selected = ['a', 'c', 'e']
>>> X_train = pd.DataFrame(np.random.randint(low=0, high=10, size=(20, 5)), columns=['a', 'b', 'c', 'd', 'e'])
>>> X_train
a b c d e
0 4 0 3 5 9
1 8 8 6 7 2
2 1 0 2 0 2
3 3 8 0 5 9
4 5 9 7 8 0
5 1 9 3 5 9 ...
>>> X_train = X_train[[col for col in X_train.columns if col in cols_selected]]
>>> X_train
a c e
0 4 3 9
1 8 6 2
2 1 2 2
3 3 0 9
4 5 7 0
5 1 3 9 ...
Upvotes: 1
Reputation: 463
According to the documentation of drop:
Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names
You can not drop columns by simply using the index of the column. You need the name of the columns. Also the axis
parameter has to be set to 1
or columns
Replace X_train.drop([x])
with X_train=X_train.drop(X_train.columns[x], axis='columns')
to make your example work.
Upvotes: 2
Reputation: 8816
I am just assuming as per the question litle:
Example DataFrame:
>>> df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Dropping Specific columns B
& C
:
>>> df.drop(['B', 'C'], axis=1)
# df.drop(['B', 'C'], axis=1, inplace=True) <-- to make the change the df itself , use inplace=True
A D
0 0 3
1 4 7
2 8 11
If you are trying to drop them by column numbers (Dropping by index
) then try like below:
>>> df.drop(df.columns[[1, 2]], axis=1)
A D
0 0 3
1 4 7
2 8 11
OR
>>> df.drop(columns=['B', 'C'])
A D
0 0 3
1 4 7
2 8 11
Upvotes: 1