Jordan Bryan
Jordan Bryan

Reputation: 43

Scikit-learn GridSearchCV - Why am I receiving a data type error when I execute grid.fit()?

I have been working on a Machine Learning project in python. After getting a basic neural net running well, I am trying to set up a grid search to optimize the parameters using the GridSearchCV function from sklearn. The grid.fit(X,Y) function is throwing this error: TypeError: only size-1 arrays can be converted to Python scalars. My interpretation is that the fit function doesn't like the format of the X and Y I gave it. This is confusing to me because the net was running fine without the grid search and I didn't mess with the network or the data at all. Can anybody explain what is happening here and how I could fix it?

This code creates the network and the grid search:

#Creating the neural network
def create_model():
  model=Sequential()

  model.add(Dense(512, activation='relu',input_shape=(2606,)))
  model.add(Dense(256, activation='relu'))
  model.add(Dense(128, activation='relu'))
  model.add(Dense(64, activation='relu'))
  model.add(Dense(32, activation='relu'))
  model.add(Dense(16, activation='relu'))
  model.add(Dense(1, activation='relu'))

  opt=optimizers.Adam(lr=learn_rate)
  model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])

  #I commented this out because I believe it is delegated to the grid.fit() fn later on.
  #model.fit(X_train, Y_train, batch_size=30, epochs=6000, verbose=1)

  return model

#Now setting up the grid search
model=KerasClassifier(build_fn=create_model())

learn_rate=np.arange(.00001,.001,.00002).tolist()
batch_size=np.arange(10,2606,2).tolist()
epochs=np.arange(1000,10000,100).tolist()

param_grid=dict(learn_rate=learn_rate, batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)

grid_results=grid.fit(X_train,Y_train) #This is the line referenced in the error message.

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Any advice would be greatly appreciated!

Edit: The X_train data has shape (167,2606). Each of the 167 elements are an array of length 2606. This is why the input_shape for the network is (2606,). The Y_train has shape (167,).

Upvotes: 1

Views: 2282

Answers (1)

gallen
gallen

Reputation: 1292

So, the issue is that GridSearchCV creates a new model with the new parameters for every combination thereof. You are passing an already created model, and a list of parameters. I believe that is the source of the array vs scalar error. Below, I've altered your code (with some garbage sample data) that will run.

The primary changes to take note of are that I altered the signature of your create_model to accept the parameters values you pass into the GridSearch. I also removed your assignment of the KerasClassifier instance to the variable model and instead put that call as the estimator in GridSearchCV.

import numpy as np
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
from sklearn.model_selection import GridSearchCV


#Creating the neural network
def create_model(learn_rate, batch_size, epochs):
    model=Sequential()

    model.add(Dense(512, activation='relu',input_shape=(2606,)))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(1, activation='relu'))

    opt=optimizers.Adam(lr=learn_rate)
    model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])

    #I commented this out because I believe it is delegated to the grid.fit() fn later on.
    #model.fit(X_train, Y_train, batch_size=30, epochs=6000, verbose=1)

    return model

#Now setting up the grid search
X_train = np.empty((167,2606), dtype=float, order='C')
Y_train = np.empty((167,), dtype=float, order='C')

learn_rate=np.arange(.00001,.001,.00002).tolist()
batch_size=np.arange(10,2606,2).tolist()
epochs=np.arange(1000,10000,100).tolist()

param_grid=dict(learn_rate=learn_rate, batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=KerasClassifier(build_fn=create_model), 
param_grid=param_grid, n_jobs=-1, cv=3)

grid_results=grid.fit(X_train,Y_train) #This is the line referenced in the error message.

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Upvotes: 1

Related Questions