Reputation: 387
I am trying to predict the score values from downloaded saved model from this notebook
https://www.kaggle.com/paoloripamonti/twitter-sentiment-analysis/
It contains 4 saved model namely :
I am using model.h5 my code here is:
from keras.models import load_model
s_model = load_model('model.h5')
#predict the result
result = model.predict("HI my name is Mansi")
But it's unable to predict.
I think the error is because I have to tokenize and encode it first but I don't know how to do that using multiple saved models.
Can anyone guide me through how to predict values and scores using the saved model as mentioned in above notebook.
Upvotes: 1
Views: 206
Reputation: 2701
One should preprocess the text before feeding into the model, following is the minimal working script(adapted from https://www.kaggle.com/paoloripamonti/twitter-sentiment-analysis/):
import time
import pickle
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
model = load_model('model.h5')
tokenizer = pickle.load(open('tokenizer.pkl', "rb"))
SEQUENCE_LENGTH = 300
decode_map = {0: "NEGATIVE", 2: "NEUTRAL", 4: "POSITIVE"}
POSITIVE = "POSITIVE"
NEGATIVE = "NEGATIVE"
NEUTRAL = "NEUTRAL"
SENTIMENT_THRESHOLDS = (0.4, 0.7)
def decode_sentiment(score, include_neutral=True):
if include_neutral:
label = NEUTRAL
if score <= SENTIMENT_THRESHOLDS[0]:
label = NEGATIVE
elif score >= SENTIMENT_THRESHOLDS[1]:
label = POSITIVE
return label
else:
return NEGATIVE if score < 0.5 else POSITIVE
def predict(text, include_neutral=True):
start_at = time.time()
# Tokenize text
x_test = pad_sequences(tokenizer.texts_to_sequences([text]), maxlen=SEQUENCE_LENGTH)
# Predict
score = model.predict([x_test])[0]
# Decode sentiment
label = decode_sentiment(score, include_neutral=include_neutral)
return {"label": label, "score": float(score),
"elapsed_time": time.time()-start_at}
predict("hello")
Test:
predict("hello")
Its output:
{'elapsed_time': 0.6313169002532959,
'label': 'POSITIVE',
'score': 0.9836862683296204}
Upvotes: 2