Reputation: 329
I'm trying to learn a bit about Tensorflow/Machine Learning. As a starting point, I'm trying to create a model that is trained on a simple 1-D function (y=x^2) and see how it behaves for other inputs outside of the training range.
The problem I'm having is that the training function doesn't really ever improve. I'm sure it's due to a lack of understanding and/or misconfiguration on my part, but there really doesn't seem to be any sort of "baby's first machine learning" out there that deals with a dataset of a known form.
My code is pretty simple, and is borrowed from TensorFlow's introduction notebook here
import tensorflow as tf
import numpy as np
# Load the dataset
x_train = np.linspace(0,10,1000)
y_train = np.power(x_train,2.0)
x_test = np.linspace(8,12,100)
y_test = np.power(x_test,2.0)
# (x_train, y_train), (x_test, y_test) = mnist.load_data()
# x_train, x_test = x_train / 255.0, x_test / 255.0
"""Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:"""
from tensorflow.keras import layers
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='mse',
metrics=['mae'])
"""Train and evaluate the model:"""
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
and I get output like this:
Train on 1000 samples
Epoch 1/5
1000/1000 [==============================] - 0s 489us/sample - loss: 1996.3631 - mae: 33.2543
Epoch 2/5
1000/1000 [==============================] - 0s 36us/sample - loss: 1996.3540 - mae: 33.2543
Epoch 3/5
1000/1000 [==============================] - 0s 36us/sample - loss: 1996.3495 - mae: 33.2543
Epoch 4/5
1000/1000 [==============================] - 0s 33us/sample - loss: 1996.3474 - mae: 33.2543
Epoch 5/5
1000/1000 [==============================] - 0s 38us/sample - loss: 1996.3450 - mae: 33.2543
100/1 - 0s - loss: 15546.3655 - mae: 101.2603
Like I said, I'm positive that this is a misconfiguration/lack of understanding on my part. I really learn best when I can take something this simple and incrementally make it more complex rather than starting on something whose patterns I can't readily identify, but I can't find any tutorials, etc that take this approach. Can anyone recommend either a good tutorial source, or just educate me on what I am doing wrong here?
Upvotes: 0
Views: 528
Reputation: 1928
I think you have mix of the problems here. I try to explain to you one by one:
First of all, the problem you want to solve is to learn the function f=x^2. So this can fit into a regression task. For a regression task ( and any other tasks ^_^ ) you should pay attention to the activation function
and also to what you really try to predict.
You have chosen softmax for activation function, which does not make sense at all. I suggest to replace it with a linear activation function ( if you remove it completely, it will be considered linear automatically by TF/Keras).
On the other hand, why you have a 10 DENSE at the last layer? Per each entry, you want to predict one value ( for 5 as the input value you wanna predict 25, right), so one DENSE should be enough to generate your value. On the other hand, since your network is not big, I would start by SGD as the optimizer, but Adam might be good as well. Additionally, for the problem you are trying to solve, I do not believe you really need 128 DENSE as the first hidden layer. you can start by a smaller number and look at how it goes. I would start by 3-4 DENSE as a start
Long story short, let's replace your model with these lines, and hopefully, it gets working
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1)
])
Upvotes: 2