Reputation: 725

MXNet AdamW optimizer

Adam optimizer has flaws when used with weight decay. In 2018, AdamW optimizer has been proposed.

Is there any standard way to implement AdamW in MXNet framework (python implementation)? There is mxnet.optimizer.Adam class, but no mxnet.optimizer.AdamW one (checked in mxnet-cu102==1.6.0, mxnet==1.5.0 package versions).

P.S. I asked this questions on MXNet forum and on datascience.stackexchange.com, but to no avail.

Upvotes: 1

Answers (1)

Alex I

Reputation: 20287

Short answer: There isn't a standard way to use AdamW in Gluon yet, but there is some existing work in that direction that would make that relatively easy to add.

Longer answer:

People have been asking for this feature - a lot :) See: https://github.com/apache/incubator-mxnet/issues/9182
Gluon-NLP has a working version of AdamW - possibly slightly different from the one in the original paper: https://github.com/eric-haibin-lin/gluon-nlp/blob/df63e2c2a4d6b998289c25a38ffec8f4ff647ff4/src/gluonnlp/optimizer/bert_adam.py
The adamw_update() operator was added with this pull request: https://github.com/apache/incubator-mxnet/pull/13728 This is first released in MXNet 1.6.0.
Unfortunately, it looks like there isn't a way to use this with gluon.Trainer directly right now, without copying/modifying the BERTAdam code (or writing something similar from scratch). That would be a very nice thing to add to Gluon.

Please let me know if you get this working, as I'd love to be able to use that as well.

Upvotes: 1

MXNet AdamW optimizer

Answers (1)

Related Questions