Reputation: 151
I am trying to save a one hot encoder from keras to use it again on different texts but keeping the same encoding.
Here is my code :
df = pd.read_csv('dataset.csv ')
vocab_size = 200000
encoded_docs = [one_hot(d, vocab_size) for d in df.text]
How can I save this encoder and use it again later ?
I found this in my research but one_hot() seems to be a function and not an object (sorry if this is plain wrong I am fairly new to python).
Upvotes: 8
Views: 12668
Reputation: 1681
The previous answer is awesome, and I find another available option which needs joblib
from joblib import dump, load
dump(clf, 'filename.joblib') # save the model
clf = load('filename.joblib') # load and reuse the model
Upvotes: 5
Reputation:
Mentioning the Answer in this Section (although it is present in Comments Section), for the benefit of the Community.
To Save the Encoder, you can use the below code:
import pickle
with open("encoder", "wb") as f:
pickle.dump(one_hot, f)
Then to Load the Saved Encoder, use the below code:
encoder = pickle.load(f)
encoded_docs =[encoder(d, vocab_size) for d in df.text]
Since the function, from.keras.preprocessing.text import one_hot
uses hash()
to generate quasi-unique encodings, we need to use a HashSeed
for reproducing our Results (getting same result even after multiple executions).
Run the below code in the Terminal, for Setting the HashSeed
:
Upvotes: 13