Miken
Miken

Reputation: 23

Tensorflow : saver.restore not restoring

When I'm trying to restore a learned model I have a problem:

The first time my program runs, it doesn't seem to load the variables, the second time I run it, the variables are loaded, the third time I have a huge error on the "saver.restore(sess, 'model.ckpt')" line starting with "NotFoundError: Key beta2_power_2 not found in checkpoint".

Here is the beginning of my code:

with tf.Session() as sess:
    myModel = SoundCNN(8)#classes
    tf.global_variables_initializer().run() 

    saver = tf.train.Saver(tf.global_variables())

    saver.restore(sess, 'model.ckpt')

You can see the SoundCNN class here, the github project in the model.py file. I'm new to tensorflow and ML and wanted to use awjuliani's project to learn to use tf for sound oriented ML.

edit: here is the full code:

print ("start")
bpm = 240
samplingRate = 44100
mypath = "instruments/drums/"
iterations = 1000
batchSize = 240

with tf.Session() as sess:
    myModel = SoundCNN(8)#classes
    tf.global_variables_initializer().run() 

    saver = tf.train.Saver(tf.global_variables())
    print("loading session ...")
    saver.restore(sess, 'model.ckpt')
    print("session loaded")


    print("processing audio ...")
    classes,trainX,trainYa,valX,valY,testX,testY = util.processAudio(bpm,samplingRate,mypath)
    print("audio processed")

    fullTrain = np.concatenate((trainX,trainYa),axis=1)

    quitFlag = False

    inputsize = fullTrain.shape[0]-1 #6607

    print("entering loop...")
    while (not quitFlag):
        indexstr = input("Type the index (0< _ <" + str(inputsize) + ") of the sample to test then press enter.\nYou can press enter without text for random index.\nType q to quit.\n")

        if (indexstr == "q" or indexstr == "Q"):
            quitFlag = True
        else:
            if(indexstr ==""):
                index = randint(0, inputsize)
                print("Index : " + str(index))
            else:
                index = int(indexstr)     

            tensors,labels_ = np.hsplit(fullTrain,[-1])
            labels = util.oneHotIt(labels_)
            tensor, label = tensors[index,:], labels[index]

            tensor = tensor.reshape(1,1024)

            result = myModel.prediction.eval(session=sess,feed_dict={myModel.x: tensor, myModel.keep_prob: 1.0})

            print("Model found sound: n°"+ str(result) + ".\nActual sound: n°" + str(np.argmax(label)) + ".\n" )

Thanks!

edit2: Okay I tryed with this code:

print ("start")
bpm = 240
samplingRate = 44100
mypath = "instruments/drums/"
iterations = 1000
batchSize = 240


tf.reset_default_graph()
myModel = SoundCNN(8)
saver = tf.train.Saver()

with tf.Session() as sess:

    print("loading session ...")
    saver.restore(sess, 'model.ckpt')
    print("session loaded")

And the variables aren't loaded (bad predictions) but the strange thing is that I can make the code work by adding :

    myModel = SoundCNN(8)
    saver = tf.train.Saver()
    print("loading session ...")
    saver.restore(sess, 'model.ckpt')
    print("session loaded")

after the first saver.restore(sess, 'model.ckpt')

So I made the code work but it's a nasty ...

Upvotes: 1

Views: 1670

Answers (1)

Ben
Ben

Reputation: 152

Ok so first of all, separate between training and testing of the model. Run conditional if statement using: tf.train.checkpoint_exists and tf.train.latest_checkpoint. Something like:

if tf.train.checkpoint_exists(tf.train.latest_checkpoint(".")):
    test()
else:
    trainNetConv(iterations)
    test()

You might as well use only latest_checkpoint as it returns None or a path if checkpoint was found.

Run 'tf.reset_default_graph()' whenever you know you'll be loading a model to clear any existing graphs. From what I experienced it stacks copies of the graphs which slows the runtime and I guess it might lead to other problems. Especially if you plan to do this multiple times during runtime.

Assuming you already have a trained model, you must first create it like you would normally do by calling SoundCNN with the same number of classes as the model that you wish to load. Make sure you create the EXACT same model, i.e same number of classes. In the code you provided, you create the model with 8 classes but the number of classes of the model that is created in 'trainNetConv' is determined by 'util.processAudio'. Worth checking that the number of classes is indeed 8 for any given directory with sound files on which it's being trained on.

The key difference when you load a model is that you don't initialize the variables, i.e you do not call the saver object with global variables or run the global variables initializer. All you have to do is:

  1. Make sure to run tf.reset_default_graph()
  2. Create the model, call SoundCNN
  3. Create a saver object with no arguments.
  4. Create a session like you do,
  5. Call the function restore of the saver object with the path to the latest checkpoint. Using 'tf.train.latest_checkpoint' with the base dir of the model.
  6. And you're done.

Check my GitHub for complete examples of training and testing phase. Make sure to start with the 'mnist' since it is only one file and the simplest there.

Assuming you wish to define additional variables for your own use, let's say some variable Counter and an operator that increments Counter if prediction is correct. It needs to be placed after you loaded the model using restore and then you would initialize those additional variables only. Again, I think my examples might help in this case.

If you have any more questions please ask, I'll try to help.

Upvotes: 1

Related Questions