Evaluation Speed is too low, and takes alot of time using HF trainer

Question

I'm training a huge self-supervised model, when I tried to train the complete dataset, it threw cuda oom errors, to fix that I decreased batch size and added gradiant accumulation along with eval accumulation steps. Its not throwing the cuda oom errors but the evaluation speed decreased by a lot.

So, while using hf trainer I set eval accumulation steps to 1, the evaluation speed is ridiculously low, is there any workaround for this? I'm using per device batchsize = 16 with gradient accumulation = 4.

Evaluation Speed is too low, and takes alot of time using HF trainer

Answers (0)

Related Questions