Memory Issue while following LM tutorial

Question

SPECS: OS: Windows 10 CUDA: 10.1 GPU: RTX 2060 6G VRAM (x2) RAM: 32GB tutorial: https://huggingface.co/blog/how-to-train

Hello I am trying to train my own language model and I have had some memory issues. I have tried to run some of this code in Pycharm on my computer and then trying to replicate in my Collab Pro Notebook.

First, my code

from transformers import RobertaConfig, RobertaTokenizerFast, RobertaForMaskedLM, LineByLineTextDataset
from transformers import DataCollatorForLanguageModeling, Trainer, TrainingArguments

config = RobertaConfig(vocab_size=60000, max_position_embeddings=514, num_attention_heads=12, num_hidden_layers=6,
                       type_vocab_size=1)

tokenizer = RobertaTokenizerFast.from_pretrained("./MODEL DIRECTORY", max_len=512)

model = RobertaForMaskedLM(config=config)

print("making dataset")

dataset = LineByLineTextDataset(tokenizer=tokenizer, file_path="./total_text.txt", block_size=128)

print("making c")

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)

training_args = TrainingArguments(output_dir="./MODEL DIRECTORY", overwrite_output_dir=True, num_train_epochs=1,
                                  per_gpu_train_batch_size=64, save_steps=10000, save_total_limit=2)
print("Building trainer")
trainer = Trainer(model=model, args=training_args, data_collator=data_collator, train_dataset=dataset,
                  prediction_loss_only=True)
trainer.train()

trainer.save_model("./MODEL DIRECTORY")

"./total_text.txt" being a 1.7GB text file.

PyCharm Attempt

This code on pycharm builds the dataset and then would throw an error saying that my preferred gpu was running out of memory, and that Torch was already using 3.7GiB of memory.

I tried:

import gc doing a gc clear to try to flush what ever was going on my gpu
Decreasing my batch size for my gpu (training only happened on a batch size of 8 resulting in 200,000+ epochs that all took 1.17 seconds)
Setting my os.environ["CUDA_VISIBLE_OBJECTS"] ="" so that torch would have to use my CPU and not my GPU. Still threw same gpu memory error...

So succumbing to the fact that torch, for the time being, was forcing itself to use my gpu, I decided to go to Collab.

Collab Attempt

Collab has different issues with my code. It does not have the memory to build the dataset, and crashes due to RAM shortages. I purchased a Pro account and then increased the usable RAM to 25GB, still memory shortages.

Cheers!

raceee · Accepted Answer

I came to the conclusion that my text file for training was way to big. From the other examples I found, the training text was around 300MB not 1.7GB. In both instances I was asking PyCharm and Collab to pull off a very resource expensive task.

Memory Issue while following LM tutorial

First, my code

PyCharm Attempt

Collab Attempt

Answers (1)

Related Questions