Reputation: 198
I was trying to use TPOTClassifier on Forest Cover Type Prediction. But after initial run, its producing errors as output. It will be helpful if you suggest how to resolve the error. Thank you.
from tpot import TPOTClassifier
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# loading the data
data = pd.read_csv("train.csv")
data_test = pd.read_csv("test.csv")
data.head()
data_test.head()
print data['Cover_Type'].values
data1 = data
data1= data1.rename(columns={'Cover_Type':'class'})
data1.dtypes
features =list(data1.dtypes[1:55].index)
target =list(data1.dtypes[55:56].index)
print data1.dtypes.tail()
## train test split
X_train , X_test, y_train, y_test = train_test_split(data1[features],data1[target],train_size=0.75, test_size=0.25)
X_train.head()
tpot =TPOTClassifier(generations=5, population_size=500, verbosity=2)
tpot.fit(X_train, y_train)
print (tpot.score(X_test, y_test))
tpot.export('tpot_forest_pipeline.py')
But its producing errors:
Generation 1 - Current best internal CV score: inf
Generation 2 - Current best internal CV score: inf
Generation 3 - Current best internal CV score: inf
Generation 4 - Current best internal CV score: inf
Generation 5 - Current best internal CV score: inf
ValueError Traceback (most recent call last) in ()
1 tpot =TPOTClassifier(generations=5, population_size=500, verbosity=2)
2 tpot.fit(X_train, y_train)
3 print (tpot.score(X_test, y_test))
4 tpot.export('tpot_forest_pipeline.py'
355 if not self._optimized_pipeline:
356 raise ValueError(There was an error in the TPOT optimization
357 process. This could be because the data was 358 not formatted properly, or because data for)ValueError: There was an error in the TPOT optimization process.This could be because the data was not formatted properly,or because data for a regression problem was provided to the TPOTClassifier object.Please make sure you passed the data to TPOT correctly.
Upvotes: 1
Views: 1622
Reputation: 2852
The issue is caused by the data1[features], which should be a 1-D array but pandas dataframe is 2D array-like data structure. Change the tpot.fit() codes as below will solve the input issue.
tpot.fit(pd.np.array(X_train), pd.np.array(y_train).ravel())
Upvotes: 1