julianjm
julianjm

Reputation: 677

Basic tensorflow classification example

i'm struggling to understand tensorflow, and I can't find good basic examples that don't rely on the MNIST dataset. I've tried to create a classification nn for some public datasets where they provide a number of (unknown) features, and a label for each sample. There's one where they provide around 90 features of audio analysis, and the year of publication as the label. (https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd)

Needless to say, I didn't manage to train the network, and little could I do for understanding the provided features.

I'm now trying to generate artificial data, and try to train a network around it. The data are pairs of number (position), and the label is 1 if that position is inside a circle of radius r around an arbitrary point (5,5).

numrows=10000
circlex=5
circley=5
circler=3

data = np.random.rand(numrows,2)*10
labels = [ math.sqrt( math.pow(x-circlex, 2) + math.pow(y-circley, 2) ) for x,y in data ]
labels = list( map(lambda x: x<circler, labels) )

If tried many combinations of network shape, parameters, optimizers, learning rates, etc (I admit the math is not strong on this one), but eithere there's no convergence, or it sucks (70% accuracy on last test).

Current version (labels converted to one_hot encoding [1,0] and [0,1] (outside, inside).

# model creation

graph=tf.Graph()
with graph.as_default():
    X = tf.placeholder(tf.float32, [None, 2] )
    layer1 = tf.layers.dense(X, 2)
    layer2 = tf.layers.dense(layer1, 2)
    Y = tf.nn.softmax(layer2)
    y_true = tf.placeholder(tf.float32, [None, 2] )

    loss=tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2(logits=Y, labels=y_true) )
    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) 
      / predictions.shape[0])

# training
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    for step in range(1000):
        _, l, predictions = session.run([optimizer,loss,Y], feed_dict={X:data, y_true:labels})
        if step % 100 == 0:
            print("Loss at step %d: %f" % (step, l)
            print("Accuracy %f" % accuracy(predictions, labels))

The acuracy in this example is around 70% (loss around 0.6).

The question is... what am I doing wrong?

UPDATE

I'm leaving the question as originally asked. Main lessons I learned:

Normalize your input data. The mean should be around 0, and the range ~ between -1 and 1.

Normalized vs not normalized

Blue: normalized data, Red: raw input data as created above

Batch your input data. If the subsets used are random enough, it decreases the number of iterations needed without hurting accuracy too much.

Don't forget activation functions between layers :)

Upvotes: 4

Views: 306

Answers (1)

Vijay Mariappan
Vijay Mariappan

Reputation: 17191

The input:

Plotting the synthetic data with two classes.

enter image description here

Output from the code above:

All outputs are classified as a single class and because of class imbalance, accuracy is high 70%.

enter image description here


Issues with the code

  1. Even though there are two layers defined, no activation function defined between the two. So tf.softmax( ((x*w1)+b1) * w2 + b2) squashes down to a single layer. There is just a single hyperplane trying to separate this input and the hyperplane lies outside the input space, thats why you get all inputs classified as a single class.
  2. Bug: Softmax is applied twice: on the logits as well as during entropy_loss.
  3. The entire input is given as a single batch, instead of mini-batches.
  4. Inputs need to be normalized.

Fixing the above issues and the output becomes:

enter image description here

The above output makes sense, as the model has two hidden layers and so we have two hyperplanes trying to separate the data. The final layer then combines these two hyperplanes in such a way to minimize error.


Increasing the hidden layer from 2 to 3:

enter image description here

With 3 hidden layers, we get 3 hyperplanes and we can see the final layer adjusts these hyperplanes to separate the data well.


Code:

# Normalize data
data = (data - np.mean(data)) /np.sqrt(np.var(data))
n_hidden = 3
batch_size = 128

# Feed batch data
def get_batch(inputX, inputY, batch_size):
   duration = len(inputX)
   for i in range(0,duration//batch_size):
     idx = i*batch_size
     yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]

# Create the graph    
tf.reset_default_graph()
graph=tf.Graph()
with graph.as_default():
   X = tf.placeholder(tf.float32, [None, 2] )
   layer1 = tf.layers.dense(X, n_hidden, activation=tf.nn.sigmoid)
   layer2 = tf.layers.dense(layer1, 2)
   Y = tf.nn.softmax(layer2)
   y_true = tf.placeholder(tf.int32, [None] )
   loss = tf.losses.sparse_softmax_cross_entropy(logits=layer2, labels=y_true)
   optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

   accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(Y, 1),tf.argmax(tf.one_hot(y_true,2), 1)), tf.float32)) 

   # training
   with tf.Session(graph=graph) as session:
      session.run(tf.global_variables_initializer())
      for epoch in range(10):
        acc_avg = 0.
        loss_avg = 0.
        for step in range(10000//batch_size):
           for inputX, inputY in get_batch(data, labels, batch_size):
               _, l, acc = session.run([optimizer,loss,accuracy], feed_dict={X:inputX, y_true:inputY})
           acc_avg += acc
           loss_avg += l
        print("Loss at step %d: %f" % (step, loss_avg*batch_size/10000))
        print("Accuracy %f" % (acc_avg*batch_size/10000))        
   #Get prediction  
   pred = session.run(Y, feed_dict={X:data})

  # Plotting function
  import matplotlib.pylab as plt
  plt.scatter(data[:,0], data[:,1], s=20, c=np.argmax(pred,1),  cmap='jet', vmin=0, vmax=1)
  plt.show()

Upvotes: 2

Related Questions