
Reputation: 4247

How to train a neural network to convert integer to roman numbers?

I am trying to train a neural net to convert an integer in to roman numbers, but my loss wont go below 0.3. Can you help me figure out what I am doing wrong?

For input I am using integer ranging from 0 to 4000. I have tried using them 1.) as-is, 2.) normalized to z values and 3.) min-max scaled.

For output y, I have 21 binary classes. They look like this:

{'MMM': 0, 'MM': 0, 'CM': 0, 'M': 0, 'CD': 0, 'D': 0, 'CCC': 0, 'CC': 0, 'XC': 0, 'C': 0, 'XL': 0, 'L': 0, 'XXX': 0, 'XX': 0, 'IX': 0, 'X': 0, 'IV': 0, 'V': 0, 'III': 0, 'II': 0, 'I': 0}

This template allows me to unambiguously represent any integer between 1 and 3,999. For e.g.

17 becomes:

{'MMM': 0, 'MM': 0, 'CM': 0, 'M': 0, 'CD': 0, 'D': 0, 'CCC': 0, 'CC': 0, 'XC': 0, 'C': 0, 'XL': 0, 'L': 0, 'XXX': 0, 'XX': 0, 'IX': 0, 'X': 1, 'IV': 0, 'V': 1, 'III': 0, 'II': 1, 'I': 0}

and 3885 becomes:

{'MMM': 1, 'MM': 0, 'CM': 0, 'M': 0, 'CD': 0, 'D': 1, 'CCC': 1, 'CC': 0, 'XC': 0, 'C': 0, 'XL': 0, 'L': 1, 'XXX': 1, 'XX': 0, 'IX': 0, 'X': 0, 'IV': 0, 'V': 1, 'III': 0, 'II': 0, 'I': 0}

My model looks like this:

model = tf.keras.models.Sequential()
model.add(Dense(56, input_shape=(1,), activation='relu'))
model.add(Dense(56, activation='relu'))
model.add(Dense(48, activation='relu'))

I have also tried with elu activation function as well, and have tried with slightly larger and smaller number of neurons. I have also tried adding up to 2 more layers.

I have tried learning rate between 0.1 and 0.001.

opt = Adam(learning_rate=0.1)

For the loss function I am using binaryCrossEntropy.

loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss)

I have also tried adding sigmoid to the last layer along with from_logits=False

Nothing seems to work. The loss doesn't go below 0.3

I have trained up to 5000 epochs with batch size ranging from 500 to 2000

h =, y, batch_size=512, epochs=400, verbose=1, shuffle=True)

Complete Google Collab Workbook is here:

What do you think is the reason for the loss not going past 0.3? What do you suggest I try next?

Upvotes: 1

Views: 243

Answers (2)


Reputation: 1195

I would add back your sigmoid activation without logits.

You also should use some sort of accuracy as loss by itself doesn't tell you much other than raw progress. This can be automatically inferred by Keras for you:

model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

I'd also consider creating a validation set to run through so you can see how the model performs on unseen data. There's no point in trying to force a higher accuracy on the training set if it means it 'over-learns' the patterns (overfitting) as this will cause it to perform worse on data it hasn't seen before:

h =, y, validation_data=(val_x, val_y), batch_size=512, epochs=400, verbose=1, shuffle=True)

Note that whatever you pass in as a metric doesn't actually impact how the model learns, it's just for a human-readable output. It's the loss function that affects how the model judges performance and consequentially the extent of the weight update step. So perhaps consider using a different loss function:

!pip install tensorflow_addons

import tensorflow_addons as tfa

loss = tfa.losses.SigmoidFocalCrossEntropy()

I've had good results with the above loss function before with multi-label problems such as this.

Another idea would be to introduce a learning rate scheduler, which automatically drops the learning rate after a certain number of epochs where no change in the monitor occurs:

reduce_lr = ReduceLROnPlateau(monitor='val_acc',min_delta=0.005 ,patience=2, factor=0.1, verbose=1, mode='max')

So we're monitoring the validation accuracy, but you can specify 'val_loss' , 'loss' etc.

We're waiting for 2 epochs, and if the val_acc hasn't increased (note mode='max' so it's checking for an increase) by half of a percent (min_delta=0.005) then the learning rate will drop by 10% (factor=0.1).

You then pass this in as a callback in your fit function:

h =, y, validation_data=(val_x, val_y), batch_size=512, epochs=400, verbose=1, shuffle=True, callbacks=[reduce_lr])


You're absolutely correct about accuracy being mis-leading. For multi-label classification I normally use top_k_categorical_accuracy, so with k=5 (recommended by some google paper iirc) the model is deemed correct if the true label appears in the top 5 predictions. But remember this won't actually affect how your model learns, it only changes your own interpretation of whether or not the model needs tweaking.

To use it, you add it to the metrics parameter inside compile:


PS I ran your code with the suggested changes and at one point it did go up to 93%, however this is meaningless, you must use some validation data to see how the model does on unseen data, as that is the whole point of creating the model in the first place. It could be doing 93% on the training set but 85% on the validation set.

Once you've done all this and got to the point where you want to cry, I'd recommend checking out Weights & Biases, specifically a process called a "sweep". There's a little bit of a learning curve but I use it for all my machine learning projects. It allows you to set a range of values for any parameter you like i.e. learning_rate = [0.1,0.001,0.0001 etc] and will run the model many times over searching for the best possible set of hyperparameters.

Upvotes: 1


Reputation: 4247

Adding additional synthetic features does the trick and the model will learn very quickly.

As the model is right now, it has only 1 input feature which is the integer number itself. This is a bad input feature as all values are unique. There is not a single value which appears more than once for the model to learn. This is almost like providing the id of the row.

Instead we can provide 4 additional numbers which when combined get us back the original number. For e.g.

  • 1235 gives 1000, 200, 30 and 5
  • 281 gives 0, 200, 80 and 1

By adding these features we are using our domain knowledge of the problem and solving the problem we had with the existing input feature where no input value ever repeated itself.

enter image description here

Upvotes: 1

Related Questions