Reputation: 1046
I am trying to find out the feature importance for Random Forest Classification Task. But it gives me following error :
'numpy.ndarray' object has no attribute 'columns'
Here is a portion of my code :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# importing dataset
dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values
#spliting dataset into test set and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
#feature importance
feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)
I expect this should give the features importance score for each column of my dataset. (Note: the original data is in CSV formate)
Upvotes: 1
Views: 19694
Reputation: 103
The iloc and loc functions could be applied to Pandas dataframe only. You are applying them in to an array. Solution: Convert the array into dataframe then apply the iloc or loc
Upvotes: 0
Reputation: 33147
Use this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# importing dataset
dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values
#spliting dataset into test set and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
#feature importance
feature_importances = pd.DataFrame(regressor.feature_importances_,index = dataset.columns,columns=['importance']).sort_values('importance',ascending=False)
Upvotes: 1
Reputation: 1493
So X_train
that comes out from train_test_split
is actually a numpy array which will never have a columns.
Secondly, you are asking for values when you make X
from dataset
which returns the numpy.ndarry and not a df.
You need to changes your line
feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)
to
columns_ = dataset.iloc[:1, 3:12].columns
feature_importances = pd.DataFrame(rf.feature_importances_,index = columns_,columns=['importance']).sort_values('importance',ascending=False)
Upvotes: 0