matt525252
matt525252

Reputation: 742

Scikit-Learn vs Keras (Tensorflow) for multinomial logistic regression

I am trying simple multinomial logistic regression using Keras, but the results are quite different compared to standard scikit-learn approach.

For example with iris data:

import numpy as np
import pandas as pd
df = pd.read_csv("./data/iris.data", header=None)

from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.3, random_state=52)

X_train = df_train.drop(4, axis=1)
y_train = df_train[4]

X_test = df_test.drop(4, axis=1)
y_test = df_test[4]

Using scikit-learn:

from sklearn.linear_model import LogisticRegression

scikit_model = LogisticRegression(multi_class='multinomial', solver ='saga', max_iter=500)
scikit_model.fit(X_train, y_train)

the average weighted f1-score on test set:

y_test_pred = scikit_model.predict(X_test)

from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, scikit_model.classes_))

is 0.96.

Then with Keras:

from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils

# first we have to encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
y_train_encoded = encoder.transform(y_train)
Y_train = np_utils.to_categorical(y_train_encoded)
y_test_encoded = encoder.transform(y_test)
Y_test = np_utils.to_categorical(y_test_encoded)

from tensorflow import keras
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, Activation
from keras.regularizers import l2

#model construction
input_dim = 4 # 4 variables
output_dim = 3 # 3 possible outputs

def classification_model():
    model = Sequential()
    model.add(Dense(output_dim, input_dim=input_dim, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
    return model

#training
keras_model = classification_model()
keras_model.fit(X_train, Y_train, epochs=500, verbose=0)

the average weighted f1-score on test set:

classes = np.argmax(keras_model.predict(X_test), axis = 1)
y_test_pred = encoder.inverse_transform(classes)

from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, encoder.classes_))

is 0.89.

Is it possible to perform identical (or at least as much as possible) logistic regression with Keras as with scikit-learn?

Upvotes: 5

Views: 2412

Answers (2)

meowongac
meowongac

Reputation: 720

One obvious difference is saga (a variant of SAG) is used in LogisticRegression while SGD is used in your NN. As far as I know, LogisticRegression doesn't support SGD. Alternatively you can use SGDRegressor or SGDClassifier instead of LogisticRegression. And here is a blog discussing the differences between them.

Upvotes: 3

ales_t
ales_t

Reputation: 2017

I tried to run your examples and noticed a couple of potential sources:

  • The test set is incredibly small, only 45 instances. This means that to get from accuracy of .89 to .96, the model only needs to predict just three more instances correctly. Due to randomness in training, your Keras results can oscillate quite a bit.
  • As explained by @meowongac https://stackoverflow.com/a/59643522/1467943, you're using a different optimizer. One point is that scikit's algorithm will automatically set its learning rate. For SGD in Keras, tweaking learning rate and/or number of epochs could lead to improvements.
  • Scikit learn quietly uses L2 regularization by default.

Using your code, I was able to get accuracy ranging from .89 to .96 by running SGD with learning rate set to .05. When switching to Adam (also with this quite high learning rate), I got more stable results ranging from .92 to .96 (although this is more of an impression as I didn't run too many trials).

Upvotes: 3

Related Questions