RahilaRahi
RahilaRahi

Reputation: 57

Test Intel Extension for Pytorch(IPEX) in multiple-choice from huggingface / transformers

I am trying out one huggingface sample with SWAG dataset https://github.com/huggingface/transformers/tree/master/examples/pytorch/multiple-choice

I would like to use Intel Extension for Pytorch in my code to increase the performance.

Here I am using the one without training (run_swag_no_trainer)

In the run_swag_no_trainer.py , I made some changes to use ipex . #Code before changing is given below:

device = accelerator.device
model.to(device)

#After adding ipex:

import intel_pytorch_extension as ipex
    device = ipex.DEVICE
    model.to(device)

While running the below command, its taking too much time.

export DATASET_NAME=swag

accelerate launch run_swag_no_trainer.py \
  --model_name_or_path bert-base-cased \
  --dataset_name $DATASET_NAME \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir /tmp/$DATASET_NAME/

Is there any other method to test the same on intel ipex?

Upvotes: 2

Views: 543

Answers (1)

Rahila T - Intel
Rahila T - Intel

Reputation: 862

First you have to understand, which factors actually increases the running time. Following are these factors:

  1. The large input size.
  2. The data structure; shifted mean, and unnormalized.
  3. The large network depth, and/or width.
  4. Large number of epochs.
  5. The batch size not compatible with physical available memory.
  6. Very small or high learning rate.

For fast running, make sure to work on the above factors, like:

  1. Reduce the input size to the appropriate dimensions that assures no loss in important features.
  2. Always preprocess the input to make it zero mean, and normalized it by dividing it by std. deviation or difference in max, min values.
  3. Keep the network depth and width that is not to high or low. Or always use the standard architecture that are theoretically proven.
  4. Always make sure of the epochs. If you are not able to make any further improvements in your error or accuracy beyond a defined threshold, then there is no need to take more epochs.
  5. The batch size should be decided based on the available memory, and number of CPUs/GPUs. If the batch cannot be loaded fully in memory, then this will lead to slow processing due to lots of paging between memory and the filesystem.
  6. Appropriate learning rate should be determine by trying multiple, and using that which gives the best reduction in error w.r.t. number of epochs.

Upvotes: 2

Related Questions