Reputation: 476
I am trying to migrate a fairly large amount of data from GCS to AppEngine via the task queue and 20 backend instances. The issue is that the new Cloud Storage library does not seem to respect the urlfetch timeout, or something else is going on.
import cloudstorage as gcs
gcs.set_default_retry_params(gcs.RetryParams(urlfetch_timeout=60,
max_retry_period=300))
...
with gcs.open(fn, 'r') as fp:
raw_gcs_file = fp.read()
So the following works just fine when the queue is paused, and I run one task at a time, but when I try to run 20 concurrent tasks against the 20 backends the following starts happening:
I 2013-07-20 00:18:16.418 Got exception while contacting GCS. Will retry in 0.2 seconds.
I 2013-07-20 00:18:16.418 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:21.553 Got exception while contacting GCS. Will retry in 0.4 seconds.
I 2013-07-20 00:18:21.554 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:25.728 Got exception while contacting GCS. Will retry in 0.8 seconds.
I 2013-07-20 00:18:25.728 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:31.428 Got exception while contacting GCS. Will retry in 1.6 seconds.
I 2013-07-20 00:18:31.428 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:34.301 Got exception while contacting GCS. Will retry in -1 seconds.
I 2013-07-20 00:18:34.301 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:34.301 Urlfetch retry 5 failed after 22.8741798401 seconds total
How can it fail after only 22 seconds? It doesn't seem to be using the retry params at all.
Upvotes: 1
Views: 466
Reputation: 554
This is a bug in gcs client library. It will be fixed soon. Thanks!
You hack will work. But if it still times out frequently, you can try to do fp.read(size=some_size). If you files are large, with a 32 MB response (URLfetch response size limit) and 90 seconds deadline, we assume a transfer rate of 364KB/s.
Upvotes: 1