ken
ken

Reputation: 56

Performance benchmarks for attaching read-only disks to google compute engine

Has anyone benchmarked the performance of attaching a singular, read-only disk to multiple Google Compute Engine instances (i.e., the same disk in read-only mode)?

The Google documentation ( https://cloud.google.com/compute/docs/disks/persistent-disks#use_multi_instances ) indicates that it is OK to attach multiple instances to the same disk, and personal experience has shown it to work at a small scale (5 to 10 instances), but soon we will be running a job across 500+ machines (GCE instances). We would like to know how performance scales out as the number of parallel attachments grows, and as the bandwidth of those attachments grows. We currently pull down large blocks of data (read-only) from Google Cloud Storage Buckets, and are wondering about the merits of switching to a Standard Persistent Disk configuration. This involves Terabytes of data, so we don't want to change course, willy-nilly.

One important consideration: It is likely that code on each of the 500+ machines will try to access the same file (400MB) at the same time. How do buckets and attached drives compare in that case? Maybe the answer is obvious - and it would save having to set up a rigorous benchmarking system (across 500 machines) ourselves. Thanks.

Upvotes: 3

Views: 597

Answers (2)

Chris Madden
Chris Madden

Reputation: 2650

Both GCS and Persistent Disk implement caching and server scaling of various forms. If you are reading 400 MB objects concurrently from many VMs I'd say GCS is the simplest and cheapest solution to get that data.

If you wanted to use Persistent Disk (PD), you can read-only attach a PD on up to 10 VMs to receive the full rated disk performance on each VM. In your case with 500 VMs you'd want to clone your source disk 50 times, and then read-only attach those 50 disks to 10 VMs each. You'd get lower latency access - especially for smaller random reads - but for sequential throughput or large block reads where you have enough concurrency it will likely be comparable to GCS.

I bloged recently about read-only persistent disks in case a deeper dive is useful.

Upvotes: 0

Stephen Weinberg
Stephen Weinberg

Reputation: 53398

Persistent disks on GCE should have consistent performance. Currently that is 12MB/s and 30IOPS per 100GB of volume size for a standard persistent disk:

https://cloud.google.com/compute/docs/disks/persistent-disks#pdperformance

Using it on multiple instances should not change the disk's overall performance. It will however make it easier to use those limits since you don't need to worry about using the instance's maximum read speed. However, accessing the same data many times at once might. I do know how either persistent disks or GCS handle contention.

If it is only a 400MB file that are in contention, it may make sense to just benchmark the fastest method to deliver this separately. One possible solution is to make duplicates of your critical file and pick which one you access at random. This should cause less nodes to contend for each file.

Duplicating the critical file means a bigger disk and therefore also contributes to your IO performance. If you already intended to increase your volume size for better performance, the copies are free.

Upvotes: 1

Related Questions