Reputation: 325
I am trying to create a program in python that uses machine learning to predict the square root of a number. I am listing what all I have done in my program:-
My actual code:-
import numpy as np
import tensorflow as tf
import pandas as pd
df = pd.read_csv('CSV/SQUARE-ROOT.csv')
X = df.iloc[:, 1].values
X = X.reshape(-1, 1)
y = df.iloc[:, 0].values
y = y.reshape(-1, 1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_test_sc = sc.fit_transform(X_test)
X_train_sc = sc.fit_transform(X_train)
sc1 = StandardScaler()
y_test_sc1 = sc1.fit_transform(y_test)
y_train_sc1 = sc1.fit_transform(y_train)
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=6))
ann.add(tf.keras.layers.Dense(units=6))
ann.add(tf.keras.layers.Dense(units=1))
ann.compile(optimizer='SGD', loss=tf.keras.losses.MeanSquaredError())
ann.fit(x = X_train_sc, y = y_train_sc1, batch_size=5, epochs = 100)
print(sc.inverse_transform(ann.predict(sc.fit_transform([[144]]))))
OUTPUT:- array([[143.99747]], dtype=float32)
Shouldn't the output be 12? Why is it giving me the wrong result?
I am attaching the csv file I used to train my model as well: SQUARE-ROOT.csv
Upvotes: 2
Views: 1213
Reputation: 889
The reason your code does not work is because you apply fit_transform
to your test set, which is wrong. You can fix it by replacing fit_transform(test)
to transform(test)
. Although I don't think StandardScaler
is neccessary, please try this:
import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
N = 10000
X = np.arange(1, N).reshape(-1, 1)
y = np.sqrt(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
#X_test_sc = sc.fit_transform(X_test) # wrong!!!
X_test_sc = sc.transform(X_test)
sc1 = StandardScaler()
y_train_sc1 = sc1.fit_transform(y_train)
#y_test_sc1 = sc1.fit_transform(y_test) # wrong!!!
y_test_sc1 = sc1.transform(y_test)
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=32, activation='relu')) # you have 10000 data, maybe you need a little deeper network
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))
ann.add(tf.keras.layers.Dense(units=1))
ann.compile(optimizer='SGD', loss='MSE')
ann.fit(x=X_train_sc, y=y_train_sc1, batch_size=32, epochs=100, validation_data=(X_test_sc, y_test_sc1))
#print(sc.inverse_transform(ann.predict(sc.fit_transform([[144]])))) # wrong!!!
print(sc1.inverse_transform(ann.predict(sc.transform([[144]]))))
Upvotes: 0
Reputation: 19153
TL;DR: You really need those nonlinearities.
The reason behind it not working could be one (or a combination) of several causes, like bad input data range, flaws in your data, over/underfitting, etc.
However, in this specific case the model you build literally can't learn the function you're trying to approximate, because not having nonlinearities makes this a purely linear model, which can't accurately approximate nonlinear functions.
A Dense
layer is implemented as follows:
x_res = activ_func(w*x + b)
where x
is the layer input, w
the weights, b
the bias vector and activ_func
the activation function (if one is defined).
Your model, then, mathematically becomes (I'm using indices 1 to 3 for the three Dense layers):
pred = w3 * (w2 * ( w1 * x + b1 ) + b2 ) + b3
= w3*w2*w1*x + w3*w2*b1 + w3*b2 + b3
As you see, the resulting model is still linear. Add activation functions and your mode becomes capable of learning nonlinear functions too. From there, experiment with the hyperparameters and see how the performance of your model changes.
Upvotes: 1