vishv
vishv

Reputation: 21

GridSearchCV error: ValueError: Sequential model 'sequential' has no defined outputs yet

I am trying to fine tune the hyperparameters for my deep learning neural network on a dataset which I have done feature engineering on. I have kept only relevant features and have standardized the data as well (using MinMaxScaler). I have followed the steps that I have seen online to find the best parameters:

  1. Feature engineering/Data standardization (Pre-processing)
  2. Making a build function of the neural network
  3. Creating a KerasRegressor object with that neural network
  4. Create parameters dictionary that I wish to test
  5. Create a GridSearchCV object with the KerasRegressor object as the estimator and the param_grid as the parameters dictionary
  6. Fitting the data using a training set (from train_test_split)
  7. Printing best_params_

However I ran into an error:

Traceback (most recent call last):
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
        ^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\utils\parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_validation.py", line 732, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\vishv\anaconda3\Lib\site-packages\scikeras\wrappers.py", line 760, in fit
    self._fit(
  File "C:\Users\vishv\anaconda3\Lib\site-packages\scikeras\wrappers.py", line 926, in _fit
    self._check_model_compatibility(y)
  File "C:\Users\vishv\anaconda3\Lib\site-packages\scikeras\wrappers.py", line 549, in _check_model_compatibility
    if self.n_outputs_expected_ != len(self.model_.outputs):
                                       ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\keras\src\models\sequential.py", line 277, in outputs
    raise ValueError(
ValueError: Sequential model 'sequential' has no defined outputs yet.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\vishv\OneDrive\Documents\Projects and Personal Learning\Spotify Top 200 Chart Analysis\prediction_test.py", line 100, in <module>
    grid = grid.fit(X_train,y_train)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 898, in fit
    self._run_search(evaluate_candidates)
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 1419, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 845, in evaluate_candidates
    out = parallel(
          ^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\utils\parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 1098, in __call__
    self.retrieve()
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\concurrent\futures\_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
ValueError: Sequential model 'sequential' has no defined outputs yet.

Below is my code. Note that I am fairly new to machine learning and neural nets:

# DataFrame Libraries
import pandas as pd
import numpy as np
import random as rnd

# Visualization Libraries
import matplotlib.pyplot as plt
from pandasgui import show
import seaborn as sns

# Machine Learning Libraries
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.metrics import R2Score
from tensorflow.keras.callbacks import EarlyStopping


# Read in Data
spotify_df = pd.read_csv('spotify_top_songs_audio_features.csv',index_col="id")

# Clean Data
    # Dropping source, mode, key, time_signature (no/little correlation to features)
spotify_df.drop(['source','mode', 'key', 'time_signature'],axis=1,inplace=True)

    # Mapping outlier in artist_names (Tyler, The Creator -> Tyler The Creator) 
def tyler_map(artist_names):
    if 'Tyler, The Creator' in artist_names:
        return artist_names.replace('Tyler, The Creator','Tyler The Creator')
    else:
        return artist_names

spotify_df['artist_names'] = spotify_df['artist_names'].apply(tyler_map)

    # Splitting artist names into lists of each artist + making dummies for each artist
spotify_df['artist_names'] = spotify_df['artist_names'].apply(lambda x:x.split(", "))

artist_dummy = pd.get_dummies(data=spotify_df['artist_names'].explode(),drop_first=True).groupby(level=0).sum()

    # Concat dummies to original list (without artist_names)
spotify_df = pd.concat([spotify_df.drop('artist_names',axis=1),artist_dummy],axis=1)

X = spotify_df.iloc[:,13:]
y = spotify_df['weeks_on_chart']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

scaler = MinMaxScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=25)

def buildModel(optimizer='adam'):
    model = Sequential()
    

    model.add(Dense(234, activation = 'relu'))
    model.add(Dropout(0.1))

    for i in range(2):
        model.add(Dense(78, activation = 'relu'))
        model.add(Dropout(0.1))

        model.add(Dense(78, activation = 'relu'))
        model.add(Dropout(0.2))

    for i in range(5):
        model.add(Dense(39, activation = 'relu'))
        model.add(Dropout(0.1))

        model.add(Dense(39, activation = 'relu'))
        model.add(Dropout(0.2))

    for i in range(3):
        model.add(Dense(13, activation = 'relu'))
        model.add(Dropout(0.1))

        model.add(Dense(13, activation = 'relu'))
        model.add(Dropout(0.2))

    model.add(Dense(1, activation = 'linear'))

    model.compile(optimizer=optimizer,loss='mean_absolute_error',metrics=['mean_absolute_error'])

    return model

nn = KerasRegressor(model=buildModel,epochs=600,callbacks=[early_stop])

parameters = {'batch_size':[30,40,50,60,70],
              'optimizer':['adam','rmsprop','adamw']}

grid = GridSearchCV(estimator=nn,param_grid=parameters,scoring='neg_mean_absolute_error',cv=3)

grid = grid.fit(X_train,y_train)

print(grid.best_params_)

Upvotes: 2

Views: 1152

Answers (2)

Daouda
Daouda

Reputation: 111

Instead on using GridSearch, I suggest you to use Keras Tuner

Upvotes: 0

Adrien Riaux
Adrien Riaux

Reputation: 533

I'd recommend using MLPRegressor from Scikit-Learn API if you want to use GridSearchCV, as it'll be more compatible. (And maybe use RandomSearchCV if you start having a lot of hyperparameters to set).

Take also a look at the Pipeline in Scikit-Learn here.

Alternatively, you can use a framework dedicated to hyperparameters tuning like Optuna, which has good support for TensorFlow.

Upvotes: 0

Related Questions