CuriousLearner
CuriousLearner

Reputation: 151

How to save one hot encoder?

I am trying to save a one hot encoder from keras to use it again on different texts but keeping the same encoding.

Here is my code :

df = pd.read_csv('dataset.csv ')
vocab_size = 200000
encoded_docs = [one_hot(d, vocab_size) for d in df.text]

How can I save this encoder and use it again later ?

I found this in my research but one_hot() seems to be a function and not an object (sorry if this is plain wrong I am fairly new to python).

Upvotes: 8

Views: 12668

Answers (2)

Memphis Meng
Memphis Meng

Reputation: 1681

The previous answer is awesome, and I find another available option which needs joblib

from joblib import dump, load
dump(clf, 'filename.joblib') # save the model
clf = load('filename.joblib') # load and reuse the model

Upvotes: 5

user11530462
user11530462

Reputation:

Mentioning the Answer in this Section (although it is present in Comments Section), for the benefit of the Community.

To Save the Encoder, you can use the below code:

import pickle
with open("encoder", "wb") as f: 
    pickle.dump(one_hot, f)

Then to Load the Saved Encoder, use the below code:

encoder = pickle.load(f) 
encoded_docs =[encoder(d, vocab_size) for d in df.text]

Since the function, from.keras.preprocessing.text import one_hot uses hash() to generate quasi-unique encodings, we need to use a HashSeed for reproducing our Results (getting same result even after multiple executions).

Run the below code in the Terminal, for Setting the HashSeed:

enter image description here

Upvotes: 13

Related Questions