Reputation: 1046

'numpy.ndarray' object has no attribute 'columns'

I am trying to find out the feature importance for Random Forest Classification Task. But it gives me following error :

'numpy.ndarray' object has no attribute 'columns'

Here is a portion of my code :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

#feature importance

feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)

I expect this should give the features importance score for each column of my dataset. (Note: the original data is in CSV formate)

Upvotes: 1

Answers (3)

Shashank

Reputation: 103

The iloc and loc functions could be applied to Pandas dataframe only. You are applying them in to an array. Solution: Convert the array into dataframe then apply the iloc or loc

Upvotes: 0

seralouk

Reputation: 33147

Use this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

#feature importance

feature_importances = pd.DataFrame(regressor.feature_importances_,index = dataset.columns,columns=['importance']).sort_values('importance',ascending=False)

Upvotes: 1

Shivam Kotwalia

Reputation: 1493

So X_train that comes out from train_test_split is actually a numpy array which will never have a columns. Secondly, you are asking for values when you make X from dataset which returns the numpy.ndarry and not a df.

You need to changes your line

feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)

columns_ = dataset.iloc[:1, 3:12].columns

feature_importances = pd.DataFrame(rf.feature_importances_,index = columns_,columns=['importance']).sort_values('importance',ascending=False)

Upvotes: 0

&#39;numpy.ndarray&#39; object has no attribute &#39;columns&#39;

Answers (3)

Related Questions

'numpy.ndarray' object has no attribute 'columns'