gggggggggggggggg
gggggggggggggggg

Reputation: 11

Using GCSFuse vs NFS share for custom training on Vertex AI

We are currently using GCS Fuse with Google Cloud Storage during our training and are seeing very slow performance. The bug seems to be with Google and they are actively working on the Fuse Bug.

I was wondering if someone has tried setting up an NFS Share for Custom Training on Vertex AI? Any ideas what kind of performance benefit would that provide?

Upvotes: 1

Views: 960

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 76043

NFS will indeed improve the performance in terms of file access latency compared to Cloud Storage. however, the access to NFS (Filestore) is more difficult than Cloud Storage. You must have a VM to access the private IP of Filestore.


However, you could continue to use Cloud Storage, it's a common and recommended pattern. But you have to follow some best practices:

  • The bucket must be in the same region as your training process, to minimize the network latency, and do not pay egress fees
  • The bucket object read must be done only at the beginning of each epoch. i.e. download the objects, store them locally (in a local disk, in memory,...), and perform your training loops. You must not access the object through GCSfuse as a local file system during the training loop, else, it's too slow.

Upvotes: 1

Related Questions