Reputation: 51
Im new at machine learning and I'm facing an issue where I want to increase the epochs for training but .train() will only do 3 epochs. What am I doing wrong?
This is my dataset:
> DatasetDict({ train: Dataset({ features: [‘text’, ‘label’], num_rows: > 85021 }) test: Dataset({ features: [‘text’, ‘label’], num_rows: 15004 > }) })
and its features:
> {‘label’: ClassLabel(num_classes=20, names=[‘01. AGRI’, ‘02. ALIM’,
> ‘03. CHEMFER’, ‘04. ATEX’, ‘05. MACH’, ‘06. MARNAV’, ‘07. CONST’, ‘08.
> MINES’, “09. DOM”, ‘10. TRAN’, ‘11. ARARTILL’, ‘12. PREELEC’, ‘13.
> CER’, ‘14. ACHIMI’, ‘15. ECLA’, ‘16. HABI’, ‘17. ANDUS’, ‘18. ARBU’,
> ‘19. CHIRUR’, ‘20. ARPA’], id=None), ‘text’: Value(dtype=‘string’,
> id=None)}
My Trainer:
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“test”],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
what my .train() is showing:
***** Running training ***** Num examples = 85021 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 31884
|Epoch|Training Loss|Validation Loss|Accuracy| |1|0.994300|0.972638|0.711610| |2|0.825400|0.879027|0.736337| |3|0.660800|0.893457|0.744401|
I would like to continue training beyond the 3 epochs to increase my accuracy and continue to decrease training and validation loss. I tried changing the num_train_epochs=10
as you can see but nothing changes.
This is largely my code:
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=10, # total number of training epochs
per_device_train_batch_size=8, # batch size per device during training
per_device_eval_batch_size=16, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
### Metrics
from datasets import load_metric
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
### Trainer
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
Upvotes: 2
Views: 3493
Reputation: 51
I found the issue. I had in my code twice defined training_args
. The second time was right before the trainer and thus the trainer was reading the args from the one definition where I did not write in the option for several epochs.
Code should be:
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=10, # total number of training epochs
per_device_train_batch_size=8, # batch size per device during training
per_device_eval_batch_size=16, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
### Metrics
from datasets import load_metric
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
After this part you can call the trainer.
Upvotes: 0