Using Keras to give a score for combination of words

Question

I have a file with a set of "words" for example:

1a 9( 9j = 2453
3a 4( 6j 0s = 2309
1 7( 8ll = 4934

It looks like random data but it isn't, it has a score for each set of "words". My file consists of about 1million lines and there is definately patterns in it. There are about 3600 unqiue individual words.

The end column contains a score for that particular arrangement of words.

I have encoded each line to ints and padded them with 0's and put them in a file called words.txt

an example of that file would be:

475,12,2495,2934,105,0,0,0,9384 (last column being the output score)

Now I have this code:

When I run it, it's loss/accuracy is very bad, loss is like 70000000.

What am I doing wrong?

from numpy import loadtxt
from itertools import islice
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt('words.txt', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=150, batch_size=10000)

_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

My goal is to predict the score of random combinations of words that I generate.

Log of fit:

:\AI>python main.py
Using TensorFlow backend.
2021-02-14 08:52:48.350476: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
C:\Users\fordy\AppData\Local\Programs\Python\Python37\lib\site-packages	ensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/150
1047711/1047711 [==============================] - 7s 7us/step - loss: 72945595445.1503 - accuracy: 0.2351
Epoch 2/150
1047711/1047711 [==============================] - 3s 3us/step - loss: 72940365091.2725 - accuracy: 0.0016
Epoch 3/150
1047711/1047711 [==============================] - 3s 3us/step - loss: 72922327250.8712 - accuracy: 0.0016
Epoch 4/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72883151430.7776 - accuracy: 0.0030
Epoch 5/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72815216732.1170 - accuracy: 0.0041
Epoch 6/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72711719248.6156 - accuracy: 0.0012
Epoch 7/150
1047711/1047711 [==============================] - 2s 2us/step - loss: 72566884174.8089 - accuracy: 1.5271e-05

Andrey · Accepted Answer

Your model is too small. Try adding embedding layer and LSTM:

model = Sequential()
model.add(tf.keras.layers.Embedding(3600, 12, input_length=8)) # <= adjust vocab size
model.add(tf.keras.layers.LSTM(8))
# model.add(tf.keras.layers.Dense(12, input_dim=8, activation='relu'))
# model.add(tf.keras.layers.Dense(8, activation='relu'))
model.add(Dense(1))

Using Keras to give a score for combination of words

Answers (1)

Related Questions