Reputation: 677
i'm struggling to understand tensorflow, and I can't find good basic examples that don't rely on the MNIST dataset. I've tried to create a classification nn for some public datasets where they provide a number of (unknown) features, and a label for each sample. There's one where they provide around 90 features of audio analysis, and the year of publication as the label. (https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd)
Needless to say, I didn't manage to train the network, and little could I do for understanding the provided features.
I'm now trying to generate artificial data, and try to train a network around it. The data are pairs of number (position), and the label is 1 if that position is inside a circle of radius r around an arbitrary point (5,5).
numrows=10000
circlex=5
circley=5
circler=3
data = np.random.rand(numrows,2)*10
labels = [ math.sqrt( math.pow(x-circlex, 2) + math.pow(y-circley, 2) ) for x,y in data ]
labels = list( map(lambda x: x<circler, labels) )
If tried many combinations of network shape, parameters, optimizers, learning rates, etc (I admit the math is not strong on this one), but eithere there's no convergence, or it sucks (70% accuracy on last test).
Current version (labels converted to one_hot encoding [1,0] and [0,1] (outside, inside).
# model creation
graph=tf.Graph()
with graph.as_default():
X = tf.placeholder(tf.float32, [None, 2] )
layer1 = tf.layers.dense(X, 2)
layer2 = tf.layers.dense(layer1, 2)
Y = tf.nn.softmax(layer2)
y_true = tf.placeholder(tf.float32, [None, 2] )
loss=tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2(logits=Y, labels=y_true) )
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
def accuracy(predictions, labels):
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
/ predictions.shape[0])
# training
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
for step in range(1000):
_, l, predictions = session.run([optimizer,loss,Y], feed_dict={X:data, y_true:labels})
if step % 100 == 0:
print("Loss at step %d: %f" % (step, l)
print("Accuracy %f" % accuracy(predictions, labels))
The acuracy in this example is around 70% (loss around 0.6).
The question is... what am I doing wrong?
I'm leaving the question as originally asked. Main lessons I learned:
Normalize your input data. The mean should be around 0, and the range ~ between -1 and 1.
Blue: normalized data, Red: raw input data as created above
Batch your input data. If the subsets used are random enough, it decreases the number of iterations needed without hurting accuracy too much.
Don't forget activation functions between layers :)
Upvotes: 4
Views: 306
Reputation: 17191
Plotting the synthetic data with two classes.
All outputs are classified as a single class and because of class imbalance, accuracy is high 70%.
tf.softmax( ((x*w1)+b1) * w2 + b2)
squashes down to a single layer. There is just a single hyperplane trying to separate this input and the hyperplane lies outside the input space, thats why you get all inputs classified as a single class.Softmax
is applied twice: on the logits
as well as during entropy_loss
.mini-batches
. The above output makes sense, as the model has two hidden layers and so we have two hyperplanes trying to separate the data. The final layer then combines these two hyperplanes in such a way to minimize error.
With 3 hidden layers, we get 3 hyperplanes and we can see the final layer adjusts these hyperplanes to separate the data well.
# Normalize data
data = (data - np.mean(data)) /np.sqrt(np.var(data))
n_hidden = 3
batch_size = 128
# Feed batch data
def get_batch(inputX, inputY, batch_size):
duration = len(inputX)
for i in range(0,duration//batch_size):
idx = i*batch_size
yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]
# Create the graph
tf.reset_default_graph()
graph=tf.Graph()
with graph.as_default():
X = tf.placeholder(tf.float32, [None, 2] )
layer1 = tf.layers.dense(X, n_hidden, activation=tf.nn.sigmoid)
layer2 = tf.layers.dense(layer1, 2)
Y = tf.nn.softmax(layer2)
y_true = tf.placeholder(tf.int32, [None] )
loss = tf.losses.sparse_softmax_cross_entropy(logits=layer2, labels=y_true)
optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(Y, 1),tf.argmax(tf.one_hot(y_true,2), 1)), tf.float32))
# training
with tf.Session(graph=graph) as session:
session.run(tf.global_variables_initializer())
for epoch in range(10):
acc_avg = 0.
loss_avg = 0.
for step in range(10000//batch_size):
for inputX, inputY in get_batch(data, labels, batch_size):
_, l, acc = session.run([optimizer,loss,accuracy], feed_dict={X:inputX, y_true:inputY})
acc_avg += acc
loss_avg += l
print("Loss at step %d: %f" % (step, loss_avg*batch_size/10000))
print("Accuracy %f" % (acc_avg*batch_size/10000))
#Get prediction
pred = session.run(Y, feed_dict={X:data})
# Plotting function
import matplotlib.pylab as plt
plt.scatter(data[:,0], data[:,1], s=20, c=np.argmax(pred,1), cmap='jet', vmin=0, vmax=1)
plt.show()
Upvotes: 2