CYBERDEVILZ
CYBERDEVILZ

Reputation: 325

Predicting the square root of a number using Machine Learning

I am trying to create a program in python that uses machine learning to predict the square root of a number. I am listing what all I have done in my program:-

  1. created a csv file with numbers and their squares
  2. extracted the data from csv into suitable variables (X stores squares, y stores numbers)
  3. scaled the data using sklearn's, StandardScaler
  4. built the ANN with two hidden layers each of 6 units (no activation functions)
  5. compiled the ANN using SGD as the optimizer and mean squared error as the loss function
  6. trained the model. Loss was around 0.063
  7. tried predicting but the result is something else.

My actual code:-

import numpy as np
import tensorflow as tf
import pandas as pd

df = pd.read_csv('CSV/SQUARE-ROOT.csv')

X = df.iloc[:, 1].values
X = X.reshape(-1, 1)
y = df.iloc[:, 0].values
y = y.reshape(-1, 1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_test_sc = sc.fit_transform(X_test)
X_train_sc = sc.fit_transform(X_train)
sc1 = StandardScaler()
y_test_sc1 = sc1.fit_transform(y_test)
y_train_sc1 = sc1.fit_transform(y_train)

ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=6))
ann.add(tf.keras.layers.Dense(units=6))
ann.add(tf.keras.layers.Dense(units=1))

ann.compile(optimizer='SGD', loss=tf.keras.losses.MeanSquaredError())

ann.fit(x = X_train_sc, y = y_train_sc1, batch_size=5, epochs = 100)

print(sc.inverse_transform(ann.predict(sc.fit_transform([[144]]))))

OUTPUT:- array([[143.99747]], dtype=float32)

Shouldn't the output be 12? Why is it giving me the wrong result?

I am attaching the csv file I used to train my model as well: SQUARE-ROOT.csv

Upvotes: 2

Views: 1213

Answers (2)

wong.lok.yin
wong.lok.yin

Reputation: 889

The reason your code does not work is because you apply fit_transform to your test set, which is wrong. You can fix it by replacing fit_transform(test) to transform(test). Although I don't think StandardScaler is neccessary, please try this:

import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

N = 10000
X = np.arange(1, N).reshape(-1, 1)
y = np.sqrt(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.2)


sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)    
#X_test_sc = sc.fit_transform(X_test)      # wrong!!!
X_test_sc = sc.transform(X_test)

sc1 = StandardScaler()       
y_train_sc1 = sc1.fit_transform(y_train)    
#y_test_sc1 = sc1.fit_transform(y_test)   # wrong!!!
y_test_sc1 = sc1.transform(y_test)

ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))    # you have 10000 data, maybe you need a little deeper network
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))
ann.add(tf.keras.layers.Dense(units=32, activation='relu'))
ann.add(tf.keras.layers.Dense(units=1))

ann.compile(optimizer='SGD', loss='MSE')
ann.fit(x=X_train_sc, y=y_train_sc1, batch_size=32, epochs=100, validation_data=(X_test_sc, y_test_sc1))

#print(sc.inverse_transform(ann.predict(sc.fit_transform([[144]]))))  # wrong!!!
print(sc1.inverse_transform(ann.predict(sc.transform([[144]]))))

Upvotes: 0

GPhilo
GPhilo

Reputation: 19153

TL;DR: You really need those nonlinearities.

The reason behind it not working could be one (or a combination) of several causes, like bad input data range, flaws in your data, over/underfitting, etc.

However, in this specific case the model you build literally can't learn the function you're trying to approximate, because not having nonlinearities makes this a purely linear model, which can't accurately approximate nonlinear functions.

A Dense layer is implemented as follows:

x_res = activ_func(w*x + b)

where x is the layer input, w the weights, b the bias vector and activ_func the activation function (if one is defined).

Your model, then, mathematically becomes (I'm using indices 1 to 3 for the three Dense layers):

pred = w3 * (w2 * ( w1 * x + b1 ) + b2 ) + b3
     = w3*w2*w1*x + w3*w2*b1 + w3*b2 + b3

As you see, the resulting model is still linear. Add activation functions and your mode becomes capable of learning nonlinear functions too. From there, experiment with the hyperparameters and see how the performance of your model changes.

Upvotes: 1

Related Questions