hera
hera

Reputation: 63

Why is my MLP model producing different F1 score with each run?

I am unsure why my MLP code produces a different F1-score with each run. The percentage vastly differs as well.

I have tried adding random state but am still receiving the same result. I'm curious to know if there's anything I'm missing.

The code is as below for your reference:

import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn import metrics
from sklearn.model_selection import train_test_split

# LOADED THE TRAINING DATASET
df = pd.read_csv('/content/wesad-chest-combined-classification-eda.csv')

# DROPPED THE CATEGORICAL COLUMNS
df = df.drop(['SSSQ class', 'condition'], axis='columns')

# REMOVED ALL THE ROWS WITH MISSING DATA
df = df.dropna()

# SEPERATED THE DATAFRAME INTO 'X' AND 'y' DATA
X = df.to_numpy()
y = df['SSSQ Label'].values

# DELETED THE 'SSSQ Label' COLUMN FROM 'X'
X = np.delete(X, 45, axis=1)

# SPLIT THE DATASET
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=10)

# PERFORMED THE CLASSIFICATION USING MLP CLASSIFIER
mlp_clf = MLPClassifier(hidden_layer_sizes=100, activation='relu',
                        learning_rate_init=0.001, learning_rate='adaptive', momentum=0.9, solver='adam')
mlp_clf.fit(X_train, y_train)
mlp_clf.score(X_test, y_test)

#EXTRACT 'y_pred' FROM 'X_test'
y_pred = mlp_clf.predict(X_test)

# PERFORMANCE METRICS
print('RESULTS AFTER APPLYING MLP CLASSIFIER ON GIVEN DATA:')
print('  Accuracy: ' + str(metrics.accuracy_score(y_test, y_pred)))
print('    Recall: ' + str(metrics.recall_score(y_test,
      y_pred, average='weighted', labels=np.unique(y_pred))))
print(' Precision: ' + str(metrics.precision_score(y_test,
      y_pred, average='weighted', labels=np.unique(y_pred))))
print('  F1 Score: ' + str(metrics.f1_score(y_test, y_pred,
      average='weighted', labels=np.unique(y_pred))))
print('\nCROSS TAB RESULTS: [0 = Low, 1 = Medium, 2 = High]')
pd.crosstab(y_test, y_pred, colnames=['Stress Levels'], rownames=[None])

Upvotes: 0

Views: 629

Answers (1)

Matt Hall
Matt Hall

Reputation: 8152

Several of the learning algorithms in sklearn are stochastic — they contain a random process like initializing parameters or sampling the data for cross-validation, etc. The clue is the presence of random_state among the hyperparameters (see the docs for this estimator). You need to set that to some seed (an integer) to remove that random variance.

I strongly recommend reading the documentation for a model you plan to use — Scikit-Learn docs are excellent and there's a lot of important stuff in there.

Upvotes: 1

Related Questions