Creating a Text Classifier with Other data features in Tensorflow 2.0/Keras

Question

Main question: How do I create a neural network that can classify text data along with numerical features?

It sounds simple, but I must not be understanding something correctly.

Background

I'm trying to build a text classifier (for the first time) using TensorFlow 2/Keras to look through app store reviews and classify them into the following categories: happy, pricingIssue, techIssue, productIssue, miscIssue

I have a data set that contains: star_rating, review_text and the associated labels.

Problem

My understanding from this tutorial from TensorFlow is that I need to use the tensorflow hub layer to embed the sentences as as a fixed shape output.

embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"

hub_layer = hub.KerasLayer(embedding, input_shape=[], dtype=tf.string, trainable=True)

And then I would build the model using that as my input layer.

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

So my question is, where do I insert the numerical rating into the data into the model?

Potential Solutions?

Use two input layers and merge them somehow? I would think that I would want to use the hub layer to embed the data, another input layer for numerical data, and then pipe them both into the next layer?

Do I embed the string first and then append the rating to that? I could also see creating a function that preprocesses the data into the array, and appends the rating onto the end of the embedded string, and just use the whole thing as the input object.

I'm stumped and any guidance is helpful!!

Phillip Geltman · Accepted Answer

After consulting with an expert, both of the above solutions can work, but have different trade offs:

Using two input layers: You can do this, but not using a sequential model, since this is no longer in sequence. It's a more traditional graph
Append the string first: Because the embedded layer is pre-trained, it doesn't need to happen inside the model, and the text can be embedded and then added into a tensor along with the numerical rating.

Since I'm the most familiar with Tensorflow 2 and Keras, I opted for the 2nd choice, so I can continue to use a sequential model.

Creating a Text Classifier with Other data features in Tensorflow 2.0/Keras

Background

Problem

Potential Solutions?

Answers (2)

Related Questions