How to use Root Mean Square Error for optimizing Neural Network in Scikit-Learn?

Question

I am new to neural network so please pardon any silly question. I am working with a weather dataset. Here I am using Dewpoint, Humidity, WindDirection, WindSpeed to predict temperature. I have read several papers on this so I felt intrigued to do a research on my own.At first I am training the model with 4000 observations and then trying to predict next 50 temperature points.

Here goes my entire code.

from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing
import numpy as np
import pandas as pd

df = pd.read_csv('WeatherData.csv', sep=',', index_col=0)

X = np.array(df[['DewPoint', 'Humidity', 'WindDirection', 'WindSpeed']])
y = np.array(df[['Temperature']])

# nan_array = pd.isnull(df).any(1).nonzero()[0]

neural_net = MLPRegressor(
    activation='logistic',
    learning_rate_init=0.001,
    solver='sgd',
    learning_rate='invscaling',
    hidden_layer_sizes=(200,),
    verbose=True,
    max_iter=2000,
    tol=1e-6
)
# Scaling the data
max_min_scaler = preprocessing.MinMaxScaler()
X_scaled = max_min_scaler.fit_transform(X)
y_scaled = max_min_scaler.fit_transform(y)


neural_net.fit(X_scaled[0:4001], y_scaled[0:4001].ravel())

predicted = neural_net.predict(X_scaled[5001:5051])

# Scale back to actual scale
max_min_scaler = preprocessing.MinMaxScaler(feature_range=(y[5001:5051].min(), y[5001:5051].max()))
predicted_scaled = max_min_scaler.fit_transform(predicted.reshape(-1, 1))

print("Root Mean Square Error ", mean_squared_error(y[5001:5051], predicted_scaled))

First confusing thing to me is that the same program is giving different RMS error at different run. Why? I am not getting it.

Run 1:

Iteration 1, loss = 0.01046558
Iteration 2, loss = 0.00888995
Iteration 3, loss = 0.01226633
Iteration 4, loss = 0.01148097
Iteration 5, loss = 0.01047128
Training loss did not improve more than tol=0.000001 for two consecutive epochs. Stopping.
Root Mean Square Error  22.8201171703

Run 2(Significant Improvement):

Iteration 1, loss = 0.03108813
Iteration 2, loss = 0.00776097
Iteration 3, loss = 0.01084675
Iteration 4, loss = 0.01023382
Iteration 5, loss = 0.00937209
Training loss did not improve more than tol=0.000001 for two consecutive epochs. Stopping.
Root Mean Square Error  2.29407183124

In the documentation of MLPRegressor I could not find a way to directly hit the RMS error and keep the network running until I reach the desired RMS error. What am I missing here?

Please help!

lejlot · Accepted Answer

First confusing thing to me is that the same program is giving different RMS error at different run. Why? I am not getting it.

Neural networks are prone to local optima. There is never a guarantee you will learn anything decent, nor (as a consequence) that multiple runs lead to the same solution. Learning process is heavily random, depends on the initialization, sampling order etc. thus this kind of behaviour is expected.

In the documentation of MLPRegressor I could not find a way to directly hit the RMS error and keep the network running until I reach the desired RMS error.

Neural networks in sklearn are extremely basic, and they do not provide this kind of flexibility. If you need to work with more complex settings you simply need more NN oriented library, like Keras, TF etc. scikit-learn community struggled a lot to even make this NN implementation "in", and it does not seem like they are going to add much more flexibility in near future.

As a minor thing - use of "minmaxscaler" seem slightly odd. You should not "fit_transform" each time, you should fit only once, and later on - use transform (or inverse_transform). In particular, it should be

y_max_min_scaler = preprocessing.MinMaxScaler()
y_scaled = y_max_min_scaler.fit_transform(y)

...

predicted_scaled = y_max_min_scaler.inverse_transform(predicted.reshape(-1, 1))

How to use Root Mean Square Error for optimizing Neural Network in Scikit-Learn?

Answers (1)

Related Questions