Cannot get good accuracy from sklearn MLP classifier

Question

I have been given some years data of Ozone, NO, NO2 and CO to work on. The task is to use this data to predict the value of ozone. Suppose i have data of year 2015,2016,2018 and 2019. I need to predict ozone value of 2019 using 2015,2016,2018 data which is with me.

Data format is hourly recorded and is present in the form of monthsimage. So in this format data is present.

What i have done: First of all the years data in one excel file which contains 4 columns NO,NO2,CO,O3. And added all the data month by month. So this is the master file which has been usedAttached image

I have used python. First the data has to be cleared. Let me explain a bit. No,No2 and CO are predecessors of ozone which means that ozone gas creation depends on these gases and the data has to be cleaned before hand and the constraints were to remove any negative value and to remove that whole row including others column so if any of the values of Ozone,No,NO2 and CO is invalid we have to remove the whole row and not count it. And the data contained some string format which also has to be removed. It was all done. Then i applied MLP classifier from sk learn Here the code which i have done.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.neural_network import MLPClassifier

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

bugs = ['NOx', '* 43.3', '* 312', '11/19', '11/28', '06:00', '09/30', '09/04', '14:00', '06/25', '07:00', '06/02',
        '17:00', '04/10', '04/17', '18:00', '02/26', '02/03', '01:00', '11/23', '15:00', '11/12', '24:00', '09/02',
        '16:00', '09/28', '* 16.8', '* 121', '12:00', '06/24', '13:00', '06/26', 'Span', 'NoData', 'ppb', 'Zero',
        'Samp<', 'RS232']
dataset = pd.read_excel("Testing.xlsx")

dataset = pd.DataFrame(dataset).replace(bugs, 0)
dataset.dropna(subset=["O3"], inplace=True)
dataset.dropna(subset=["NO"], inplace=True)
dataset.dropna(subset=["NO2"], inplace=True)
dataset.dropna(subset=["CO"], inplace=True)

dataset.drop(dataset[dataset['O3'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['O3'] > 160].index, inplace=True)
dataset.drop(dataset[dataset['O3'] == 0].index, inplace=True)

dataset.drop(dataset[dataset['NO'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['NO'] > 160].index, inplace=True)
dataset.drop(dataset[dataset['NO'] == 0].index, inplace=True)

dataset.drop(dataset[dataset['NO2'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['NO2'] > 160].index, inplace=True)
dataset.drop(dataset[dataset['NO2'] == 0].index, inplace=True)

dataset.drop(dataset[dataset['CO'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['CO'] > 4000].index, inplace=True)
dataset.drop(dataset[dataset['CO'] == 0].index, inplace=True)
dataset = dataset.reset_index()
dataset = dataset.drop(['index'], axis=1)
feat = dataset[["NO", "NO2", "CO"]].astype(int)
label = dataset[["O3"]].astype(int)
X_train, X_test, y_train, y_test = train_test_split(feat, label, test_size=0.1)

# X_train = dataset.iloc[0:9200, 0:3].values.astype(int)
# y_train = dataset.iloc[0:9200, 3].values.astype(int)
# X_test = dataset.iloc[9200:9393, 0:3].values.astype(int)
# y_test = dataset.iloc[9200:9393, 3].values.astype(int)
sc_x = StandardScaler()
X_train = sc_x.fit_transform(X_train)
X_test = sc_x.fit_transform(X_test)


def accuracy(confusion_matrix):  # <--==
    diagonal_sum = confusion_matrix.trace()
    sum_of_all_elements = confusion_matrix.sum()
    return diagonal_sum / sum_of_all_elements


classifier = MLPClassifier(hidden_layer_sizes=(250, 100, 10), max_iter=100000, activation='relu', solver='adam',
                           random_state=1)
classifier.fit(X_train, y_train.values.ravel())

y_pred = classifier.predict(X_test)
print(f"
{X_test}
  ----> 
Predictions : 
{y_pred}
{y_pred.shape}
")
cm = confusion_matrix(y_pred, y_test)
print(f"
Accuracy of MLP.Cl : {accuracy(cm)}
")
print(accuracy_score(y_test, y_pred))


y_test = pd.DataFrame(y_test)
y_test = y_test.reset_index(0)
y_test = y_test.drop(['index'], axis=1)
y_test = y_test.head(100)
# y_test = y_test.drop([19,20],axis=0)
y_pred = pd.DataFrame(y_pred)
y_pred = y_pred.shift(-1)
y_pred = y_pred.head(100)
# y_pred = y_pred.drop([19,20],axis=0)
plt.figure(figsize=(10, 5))
plt.plot(y_pred, color='r', label='PredictedO3')
plt.plot(y_test, color='g', label='OriginalO3')
plt.legend()
plt.show()

This the code Attaching the plot here console here:

PyDev console:
[[-0.53939794 -0.59019756 -0.53257553]
 [ 2.55576818  0.45245455 -0.7648624 ]
 [-0.36744427  0.73681421 -0.30028866]
 ...
 [-0.59671583 -0.02147823  1.81678204]
 [-0.25280849  0.73681421  1.31145621]
 [-0.53939794  0.64202766  0.18466113]]
  ----> 
Predictions : 
[15 39 45 40 42 11 14 32 23 23 21 23  3 15 23 59 15 34 12 10 42 23 12  8
 14  3  8 42 12 61 36 13 11 20 12 10 14 42 12 20  9  5 14 11 20 14 10 85
 42 73 43 23 61 85 55 13 14 20 85 32 15 15 42 42 12 23 13 23 85  8 23 11
 36 32 20 12 27 35 55 17 15 23 12 44 42 17 23 45 35 23  3 11 23 12 60 11
 15 39 15 44 49  7 35 42 45 13 12 55 42 18 42  6 23 14 60 43 16 18 10 43
 85 20 23 88  8 20 26 23 53 45 16  4 48 27  3 61 15  7 23  6 40 12 44 12
 12  4 12 13 24 24 23 15 16 13 40 12 12 10 12 15 53 12 42 45 38 23 45 17
 12 30 12 45 60 65 12 52  4 35  3 15 11 23 40 42 18 23 45 45 49 43 35 62
 46 14 21 11  6 24 23 16 23 21 45 42 85 39 12 16 10 38 43  6 23 20 11 65
 14 14 14 45 24 18 85 60 15 10 16 14 23 10 17  6 13 42  4  7 17 51 23  3
 85 42 23 55 21 15 32 14 17 12 42 18 16  8  6 10 14 12 42 15 14 43 25 12
 14 15 85 20 42 23 43 32 18 12 42 35  6 47 12 20 12  6 51  8 20 45 40 43
 12 14 44 23 23 21 15 45 24 12 23 23 42 15 12 46 35  8 14 16 42 11 42 16
 13 61 60 25 26 16 45 10 17  5 43 21 26 12 49 12 42 11 38 48 21 45  9 48
 11 20 13 23 16 21 11 12 44 55 11 16 53 45  8 17 12  9 85 56  7 23 23 26
 12 42 42 51 17 23 43 52 24 12 29 11 21 42 16  6 20 18 16  8 14 15 13 43
 10 23 16 15 42 43 23 11 14 25 47 16 24 14  7 43 45 14  5 18 51 42 20 15
 39 32 12 44 13 51 12 43 42 23 42 17 11 12 11 42 12  5 35 51 23 51 14  9
 11 34 18 21 88 21 15 15  6 49 12 51  8 12 49  8  4 17 15  6 26  3 15 43
 14  5 23 15 88 21 85 11 23 25 45 14 12 65 45 27 48 42 12 14 44 45  4 44
 40 16 23 25 15 10 20 12 15 62  6 13 20 20 11 56 12 40 11 14 25  6 25 12
 40 85 40 85 43 11 14 32 11  8  6  8 23 12 26 18 60 18 51 40 13 51 12  8
 23 45 20  4 23 11  3 12 51 11 18 12 40 14 40  7 85 44 60 85 45 14 14 14
 11 55 18 16 45 13 23 51 11 14 23 18 14  7 40 23 15 32 12 12 23 42 49 88
 11 11 42  6 25 12  6 11 18  6 13 35  8 15 42 39 23  9 23 32 20 21 12 20
 20 38  7 12 42  8 13 17 55 60 16 39 18 42 42 12 60 14 16 40  9 18 85 40
  5 14 23 45 10 24 14 25 11 17 15 42 42 15 23 15  8 34 16 60 42 14 48 51
 11  6 51 15 42 12 42 20 12 25 26 25 45 26 40 48 23 45 23 21 11 17 48 12
 12  6 15 34 10 16 18 17 13 20 45  3  9 39 12 11 15 23 42 45 45 65 51  6
 45 15 15 17 51  8 51 34 14 17 13 38 38 21 18 51 55 16  9 44 42  6 42 17
  6 25 88 11 10 48 20 40 21 12 44 27 47 42 38 15 49 12 12 12  6 12  8 16
 42  9 20 18 23 18 12 13 20 16 14 12 23 10 60 18 25 23 43 21 12 12 10 61
 21 40  6 16 45 38 12 17 12 15 32  9 38 17 14 11  6 15 14  6 48 21 13 13
 15 36  3 45 25 29 24 16  8 10 27 21 20 51 10 16 21 12 20 23 46 23  3 34
 29 15 23 15 48 42 17 42 43 15 35 34 23 23 44 23  4 35 12 42 49 36 15 18
 15 14 11 18 16 20 15 25  9 43 51 45 12 15 39 21 51 18 24 26 17  9 42 44
 12 30 32  8 20 44 52 20 23 23 15 12 12 42  8  5 42 23 21 16 24 65 16 12
 38 36 43 60 15  7 85 15 26 42 40 11 12 23 12 20 40 23 42  6 23 52 16 20
 23 45 51  9 42 42 25  6 21 23 15  8 12 12 26 11 16 15 39  8 26 43 48 47
 12 48 12 11]
(940,)

and

Accuracy of MLP.Cl : 0.0425531914893617
0.0425531914893617

I can't get the right result or you can say right predictions.

desertnaut · Accepted Answer

You are trying to predict a continuous value, which is a regression problem, not a classification one; consequently, MLPClassifier is the wrong model to apply here - the correct one being an MLPRegressor.

On top of this, accuracy is meaningful for classification problems only, and it is meaningless in regression ones, like yours here; so, after switching to the correct model, you should also use some other performance metric suitable for regression problems.

Cannot get good accuracy from sklearn MLP classifier

Answers (1)

Related Questions