Possible incorrect usage of custom eval_metric in MXNet

Question

I am working on a problem and were trying to solve using MXNet. I was trying to use a custom metric in the code. The code for the same is:

def calculate_sales_from_bucket(bucketArray):
    return numpy.asarray(numpy.power(10, calculate_max_index_from_bucket(bucketArray)))

def calculate_max_index_from_bucket(bucketArray):
    answerArray = []
    for bucketValue in bucketArray:
        index, value = max(enumerate(bucketValue), key=operator.itemgetter(1))
        answerArray.append(index)
    return answerArray


def custom_metric(label, bucketArray):
    return numpy.mean(numpy.power(calculate_sales_from_bucket(label)-calculate_sales_from_bucket(bucketArray),2))

model.fit(
    train_iter,         # training data
    eval_data=val_iter, # validation data
    batch_end_callback = mx.callback.Speedometer(batch_size, 1000),    # output progress for each 1000 data batches
    num_epoch = 10,     # number of data passes for training 
    optimizer = 'adam',
    eval_metric = mx.metric.create(custom_metric),
    optimizer_params=(('learning_rate', 1),)
)

I am getting the output as:

INFO:root:Epoch[0] Validation-custom_metric=38263835679935.953125
INFO:root:Epoch[1] Batch [1000]      Speed: 91353.72 samples/sec        Train-custom_metric=39460550891.057487
INFO:root:Epoch[1] Batch [2000]        Speed: 96233.05 samples/sec  Train-custom_metric=9483.127650
INFO:root:Epoch[1] Batch [3000] Speed: 90828.09 samples/sec   Train-custom_metric=57538.891485
INFO:root:Epoch[1] Batch [4000] Speed: 93025.54 samples/sec   Train-custom_metric=59861.927745
INFO:root:Epoch[1] Train-custom_metric=8351.460495
INFO:root:Epoch[1] Time cost=9.466
INFO:root:Epoch[1] Validation-custom_metric=38268.250469
INFO:root:Epoch[2] Batch [1000]     Speed: 94028.96 samples/sec       Train-custom_metric=58864.659051
INFO:root:Epoch[2] Batch [2000]     Speed: 94562.38 samples/sec       Train-custom_metric=9482.873310
INFO:root:Epoch[2] Batch [3000]      Speed: 93198.68 samples/sec        Train-custom_metric=57538.891485
INFO:root:Epoch[2] Batch [4000]      Speed: 93722.89 samples/sec        Train-custom_metric=59861.927745
INFO:root:Epoch[2] Train-custom_metric=8351.460495
INFO:root:Epoch[2] Time cost=9.341
INFO:root:Epoch[2] Validation-custom_metric=38268.250469

In this case, irrespective of change in train-custom_metric for batches, the train-custom_metric is still the same. Like in case of batch 1000 for epoch 1 and epoch 2.

I believe that this is an issue as the Train-custom_metric and Validation-custom_metric is not changing irrespective of the value of epoch steps. I am a beginner in MXNet and I might be wrong in this assumption.

Can you confirm if I am passing eval_metric in the correct way?

Simon Corston-Oliver · Accepted Answer

Not sure I understand the problem. Your output shows train-custom-metric giving different values, it just happens to have given the same result for the last two batches of each epoch. That may just be a quirk of how your model is converging.

One thing to be clear on is that eval_metric is only used to give debug output -- it's not actually used as the loss function during learning:

https://github.com/apache/incubator-mxnet/issues/1915

Possible incorrect usage of custom eval_metric in MXNet

Answers (1)

Related Questions