Reputation: 21
I trained a LSTM network on yelp https://www.yelp.com/dataset restaurants data set. It is a large dataset and it took several days to train on my PC. Anyways I saved the model and weights and now wish to use it for predictions for real time sentiment evaluations.
What is the common / good / best practice to do this: I load the model and the weights, I then compile it. This is not an issue there are plenty examples in the documentation or on the Internet. However what next? All I need to do is to tokenize the newly received review then pad it and pass to the model.predict?
tokenizer = Tokenizer(num_words = 2500, split=' ')
tokenizer.fit_on_texts(data['text'].values)
print(tokenizer.word_index)
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X)
Cannot be that simple… If it is all what is required then how this is connected with the tokenizer that was used to train the model? It was an expensive operation to tokenize more than 2.5 milion reviews downloaded originally from yelp dataset?
Thank you for any suggestions.
Upvotes: 0
Views: 761
Reputation: 21
Yes, thank you worked perfectly. Just for completness of this thread:
I saved / loaded the tokenizer using:
import pickle
def save_tokenizer(file_path, tokenizer):
with open(file_path, 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
def load_tokenizer(file_path):
with open(file_path, 'rb') as handle:
tokenizer = pickle.load(handle)
return tokenizer
Then used the tokenizer for predictions:
tokenizer = u.load_tokenizer("SavedModels/tokenizer.pcl")
X = tokenizer.texts_to_sequences(data['text'].values)
X = pad_sequences(X, maxlen = maxLength)
print(X)
model = u.load_model_from_prefix("single layer")
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
prediction = model.predict(X)
print(prediction)
print(np.argmax(prediction))
Thanks for your help.
Upvotes: 2
Reputation: 7129
You will want to save the Tokenizer
and reuse it at inference time to make sure that your test sentence is decomposed into the correct integers. See this answer for an example on how to do this.
Upvotes: 2