How to create an NLP processing pipeline with Keras

Question

I regularly use scikit-learn pipelines to streamline model processing, and I'm wondering the easiest way to do something similar with Keras in Tensorflow 2.0.

What I'd like to do is deploy a Keras model as an API endpoint, and then submit a piece of text in a numpy array to it and have it tokenized, padded and predicted. But I don't know the shortest path to do this.

Here's some sample code:

from tensorflow import keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Dense, Flatten
import numpy as np

sample_words = [
'The sky is blue',
'The sky delivers us many gifts',
'Wise men appreciate gifts for what they are, not what they are not',
'Wherever you go, there you are',
'Don\'t pass judgment onto others, or you will quickly be judged yourself'
]

y = np.array([1, 0, 1, 1, 0])

tokenizer = Tokenizer(num_words=10)
tokenizer.fit_on_texts(sample_words)

train_sequences = tokenizer.texts_to_sequences(sample_words)

train_sequences = pad_sequences(train_sequences, maxlen=7)
  mod = Sequential([
  Embedding(10, 2, input_length=7),
  Flatten(),
  Dense(3, activation='relu'),
  Dense(1, activation='sigmoid')
])

mod.compile(optimizer='adam', loss='binary_crossentropy')
mod.fit(train_sequences, y)

The idea is that if I have a web form and someone submits a form with the words 'The sky is pretty today', I can wrap it in a numpy array, send it to the endpoint (which will be setup on Google Cloud), and have it padded, tokenized, and predicted.

In scikit learn it would be as simple as: pipe = make_pipeline(tokenizer, mod), and then go from there.

I have a feeling there are some solutions that include td.Datasets, but I was hoping keras had something in it that was more user friendly.

How to create an NLP processing pipeline with Keras

Answers (1)

Related Questions