Svensk
Svensk

Reputation: 1

Using custom pre-trained word embeddings

I have a fairly simple script to classify intents from natural language queries working pretty well, to which I want to add a word embedding layer from a pre-trained custom model of 200 dims. I'm trying to help myself with this tutorial Keras pretrained_word_embeddings But with what I have achieved so far, the training is very very slow! and even worse the model doesn't learn, accuracy doesn't improve with each epoch, something impossible to handle. I think I have not configured the layers correctly or the parameters are not correct. Could you help with this??

with open("tf-kr_esp.json") as f:
rows = json.load(f)
for row in rows["utterances"]:
    w = nltk.word_tokenize(row["text"])
    words.extend(w)
    documents.append((w, row["intent"]))
    if row["intent"] not in classes:
        classes.append(row["intent"])

words = sorted(list(set(words)))
classes = sorted(list(set(classes)))


word_index = dict(zip(words, range(len(words))))

embeddings_index = {}
with open('embeddings.txt') as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

num_tokens = len(words) + 2
embedding_dim = 200
hits = 0
misses = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
    # Words not found in embedding index will be all-zeros.
    # This includes the representation for "padding" and "OOV"
        embedding_matrix[i] = embedding_vector
        hits += 1
    else:
        misses += 1
print("Converted %d words (%d misses)" % (hits, misses))

embedding_layer = Embedding(
    num_tokens,
    embedding_dim,
    embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix),
    trainable=False,
)

# create our training data
training = []
output_empty = [0] * len(classes)

for doc in documents:
    bag = []
    pattern_words = doc[0]
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1
    training.append([bag, output_row])

random.shuffle(training)
training = np.array(training, dtype="object")
train_x = list(training[:,0])
train_y = list(training[:,1])


int_sequences_input = tf.keras.Input(shape=(None,), dtype="int64")
embedded_sequences = embedding_layer(int_sequences_input)
x = layers.Conv1D(128, 5, activation="relu")(embedded_sequences)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.5)(x)
preds = layers.Dense(69, activation="softmax")(x)
model = tf.keras.Model(int_sequences_input, preds)
model.summary()

#sgd = SGD(learning_rate=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit(np.array(train_x), np.array(train_y), epochs=20, batch_size=128, verbose=1)

Epoch 1/20
116/116 [==============================] - 279s 2s/step - loss: 4.2157 - accuracy: 0.0485
Epoch 2/20
116/116 [==============================] - 279s 2s/step - loss: 4.1861 - accuracy: 0.0550
Epoch 3/20
116/116 [==============================] - 281s 2s/step - loss: 4.1607 - accuracy: 0.0550
Epoch 4/20
116/116 [==============================] - 283s 2s/step - loss: 4.1387 - accuracy: 0.0550
Epoch 5/20
116/116 [==============================] - 286s 2s/step - loss: 4.1202 - accuracy: 0.0550
Epoch 6/20
116/116 [==============================] - 284s 2s/step - loss: 4.1047 - accuracy: 0.0550
Epoch 7/20
116/116 [==============================] - 286s 2s/step - loss: 4.0915 - accuracy: 0.0550
Epoch 8/20
116/116 [==============================] - 283s 2s/step - loss: 4.0806 - accuracy: 0.0550
Epoch 9/20
116/116 [==============================] - 280s 2s/step - loss: 4.0716 - accuracy: 0.0550
Epoch 10/20
116/116 [==============================] - 283s 2s/step - loss: 4.0643 - accuracy: 0.0550

Upvotes: 0

Views: 470

Answers (1)

Sarim Sikander
Sarim Sikander

Reputation: 406

Can you mention how many number of class you have got? and also the embedding dimension is 200 that is okay but it is reality that pretrained vectors takes long time to train on the new embeddings. To make it more fast you can lower your input features in Convolutional layers. also you can use Adam as an optimizer instead of SGD. As SGD is much slower than Adam.

Upvotes: 0

Related Questions