OutOfMemoryError with PatchCore Training on 23.67 GiB GPU

Question

I’m training a PatchCore model with an image size of 128x512 on a GPU with 23.67 GiB memory. However, I’m encountering the following error:

CUDA Version: 12.4
PyTorch Version: 2.5.1

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.17 GiB. GPU 0 has a total capacity of 23.67 GiB of which 47.88 MiB is free. Including non-PyTorch memory, this process has 23.62 GiB memory in use. Of the allocated memory 23.29 GiB is allocated by PyTorch, and 15.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management.

Configuration (`yaml`):

data:
  class_path: anomalib.data.Folder
  init_args:
    name: train_data
    root: ""
    image_size:
      - 128
      - 512
    normal_dir: ""
    abnormal_dir: ""
    normal_test_dir: ""
    mask_dir: ""
    normal_split_ratio: 0
    extensions: [".png"]
    train_batch_size: 4
    eval_batch_size: 4
    num_workers: 8
    train_transform:
      class_path: torchvision.transforms.v2.Compose
      init_args:
        transforms:
          - class_path: torchvision.transforms.v2.RandomAdjustSharpness
            init_args:
              sharpness_factor: 0.7
              p: 0.5
          - class_path: torchvision.transforms.v2.RandomHorizontalFlip
            init_args:
              p: 0.5
          - class_path: torchvision.transforms.v2.Resize
            init_args:
              size: [128, 512]
          - class_path: torchvision.transforms.v2.Normalize
            init_args:
              mean: [0.485, 0.456, 0.406]
              std: [0.229, 0.224, 0.225]
    eval_transform:
      class_path: torchvision.transforms.v2.Compose
      init_args:
        transforms:
          - class_path: torchvision.transforms.v2.Resize
            init_args:
              size: [128, 512]
          - class_path: torchvision.transforms.v2.Normalize
            init_args:
              mean: [0.485, 0.456, 0.406]
              std: [0.229, 0.224, 0.225]

model:
  class_path: anomalib.models.Patchcore
  init_args:
    backbone: wide_resnet50_2
    layers:
      - layer2
      - layer3
    pre_trained: true
    coreset_sampling_ratio: 0.1
    num_neighbors: 9

Steps I’ve Tried:

Lowering the batch size: I reduced the batch size to as low as 1, but the issue persists.

Checking for memory fragmentation: Followed the suggestion in the error to set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. However, this did not solve the problem.

Ensuring no memory leakage: Verified that no other processes are consuming GPU memory using nvidia-smi, but the allocated memory remains maxed out during training.

Questions:

Are there specific optimizations for PatchCore or PyTorch that can help reduce memory usage?

OutOfMemoryError with PatchCore Training on 23.67 GiB GPU

Configuration (`yaml`):

Answers (1)

Related Questions

OutOfMemoryError with PatchCore Training on 23.67 GiB GPU

Configuration (yaml):

Answers (1)

Related Questions

Configuration (`yaml`):