Reputation: 51
import os
from pylab import rcParams
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns; sns.set()
from numpy import *
from scipy import stats
from pandas.plotting import scatter_matrix
import sklearn
import warnings
from imblearn.over_sampling import SMOTE
import tensorflow as tf
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
data = pd.read_excel(r'Attrition Data Exercise.xlsx')
X = data.iloc[:, 3:-1].values
y = data.iloc[:, -1].values
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder
ct = ColumnTransformer(transformers=
[('one_encoder', OneHotEncoder(), [2, 5, 11, 13, 28]),
('ord_encoder', OrdinalEncoder(), [0])],
remainder='passthrough')
X = np.array(ct.fit_transform(X))
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dropout(rate=0.3))
ann.add(tf.keras.layers.Dense(units=6, activation='relu', kernel_regularizer='l1', bias_regularizer='l2'))
ann.add(tf.keras.layers.Dropout(rate=0.3))
ann.add(tf.keras.layers.Dense(units=3, activation='relu', kernel_regularizer='l1', bias_regularizer='l2'))
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
opt = tf.keras.optimizers.Adam(
learning_rate=0.001,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08)
ann.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy', tf.keras.metrics.Recall()])
The above code runs successfully. It's when I run the below code in a cell that it causes an error.
pipe = Pipeline([('smt', SMOTE()), ('model', KerasClassifier(build_fn = ann, verbose = 0, epochs=170))])
weights = np.linspace(0.5, 0.5, 1)
gsc = GridSearchCV(
estimator = pipe,
param_grid = {
'smt__sampling_strategy' : weights
},
scoring = 'f1',
cv = 4)
grid_result = gsc.fit(X_train, y_train)
The code above results in the following error:
ValueError: The first argument to `Layer.call` must always be passed
Any idea what I might be doing wrong or what can be improved? I tried replacing KerasClassifier with KerasRegressor too just to see if something changes but nothing did. What essentially is going wrong?
I'm trying to use the Pipeline class from imblearn and GridSearchCV to get the best parameters for classifying the imbalanced dataset, I want to leave out resampling of the validation set and only resample the training set, which imblearn's Pipeline seems to be doing. However, I'm getting an error while implementing the accepted solution
Also link to the screenshot to the error trace is attached.Error Trace Complete
Upvotes: 5
Views: 7941
Reputation: 186
@danr got it correct. Many thanks to him. I was getting the same error when using KerasClassifier with sklearn's cross_val_score. Adding the lambda after build_fn solved the problem. I had a function create_model that created a keras Sequential model. Corrected code that runs smoothly (tensorflow 2.4.1):
from sklearn.model_selection import cross_val_score
# Create a KerasClassifier using best params determined using RandomizedSearchCV above
model = KerasClassifier(build_fn = lambda: create_model(learning_rate = 0.01, activation = 'tanh'), epochs = 50, batch_size = 32, verbose = 0)
# Calculate the accuracy score for each fold
kfolds = cross_val_score(model, X, y, cv = 3)
Upvotes: 5