PlusCoding
PlusCoding

Reputation: 13

how to save ocr model from keras author-A_K_Nain

Im studying tensorflow ocr model from keras example authored by A_K_Nain. This model use custom object (CTC Layer). It is in the site:https://keras.io/examples/vision/captcha_ocr/ I trained model using my dataset and then the result of prediction model is perfect. I want to save and load this model and i tried it. But i got some errors so i appended this code in CTC Layer class.

def get_config(self):
    config = super(CTCLayer, self).get_config()
    config.update({"name":self.name})
    return config

After that I tried to save whole model and weight but nothing worked. So i applied 2 save point. First way.

history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=70,
    callbacks=[early_stopping],
)

model.save('./model/my_model')

---------------------------------------

new_model = load_model('./model/my_model', custom_objects={'CTCLayer':CTCLayer})

prediction_model = keras.models.Model(
  new_model .get_layer(name='image').input, new_model .get_layer(name='dense2').output
)

and second way.

prediction_model = keras.models.Model(
  model.get_layer(name='image').input, model.get_layer(name='dense2').output
)

prediction_model.save('./model/my_model')

These still never worked. it didn't make error but result of prediction is terrible. Accurate results are obtained when training and saving and loading are performed together. If I load same model without training together, the result is so bad.

How can i use this model without training everytime? please help me.

Upvotes: 1

Views: 627

Answers (2)

Furqan Ali
Furqan Ali

Reputation: 727

The problem is not in the saved model but in the character list that you are using to map number back to string. Everytime you restart the notebook, it resets the character list and when you load your model, it can't accurately map the numbers back to string. To resolve this issue you need to save character list. Please follow the below code.

train_labels_cleaned = []
characters = set()
max_len = 0

for label in train_labels:
  label = label.split(" ")[-1].strip()
  for char in label:
    characters.add(char)

  max_len = max(max_len, len(label))
  train_labels_cleaned.append(label)

print("Maximum length: ", max_len)
print("Vocab size: ", len(characters))

# Check some label samples
train_labels_cleaned[:10]

ff = list(characters)

# save list as pickle file
import pickle
with open("/content/drive/MyDrive/Colab Notebooks/OCR_course/characters", "wb") as fp:   #Pickling
    pickle.dump(ff, fp)

# Load character list again
import pickle
with open("/content/drive/MyDrive/Colab Notebooks/OCR_course/characters", "rb") as fp:   # Unpickling
    b = pickle.load(fp)
    print(b)

AUTOTUNE = tf.data.AUTOTUNE

# Maping characaters to integers
char_to_num = StringLookup(vocabulary=b, mask_token=None)

#Maping integers back to original characters
num_to_chars = StringLookup(vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True)

Now when you map back the numbers to string after prediction, it will retain the original order and will predict accurately.

If you still didn't understand the logic, you can watch my video in which I explained this project from scratch and resolved all the issues that you are facing.

https://youtu.be/ZiUEdS_5Byc

Upvotes: 0

elbe
elbe

Reputation: 1508

The problem does not come from tensorflow. In the captcha_ocr tutorial, characters is a set, sets are unordered. So the mapping from characters to integers using StringLookup is dependent of the current run of the notebook. That is why you get rubbish when using it in another notebook without retraining, the mapping is not the same!
A solution is to use an ordered list instead of the set for characters :

characters = sorted(list(set([char for label in labels for char in label])))

Note that the set operator here permits to get a unique version of each character and then it is converted back to a list and sorted. It will work then on any script/notebook without retraining (using the same formula).

Upvotes: 0

Related Questions