Reputation: 1185
I have a text file (say, "X") stored on GCS and created and updated by GCS Client Library. I use GAE Python. On every addition of some data by user of my website, I add a Task (taskqueue.Task) to the "default" queue to do some actions including modification of file ("X").
Sometimes, I get the following error in the logs:
E 2014-07-20 03:19:06.238 500 3KB 430ms /t
0.1.0.2 - - [19/Jul/2014:14:49:06 -0700] "POST /t HTTP/1.1" 500 2569 "http://www.myappdomain.com/p" "AppEngine-Google; (+http://code.google.com/appengine)" "www.myappdomain.com" ms=430 cpu_ms=498 cpm_usd=0.000287 queue_name=default task_name=14629523467445182169 instance=00c61b117c48b4db44a58e0d454310843e7848 app_engine_release=1.9.7 trace_id=3db3eb580b76133e90947539c0446910
I 03:19:05.813 [class TaskQueueWorker] work=[sitemap_index_entry]
I 03:19:05.813 country_id=[US] country_name=[USA] state_id=[CA] state_name=[California] city_id=[SVL] city_name=[Sunnyvale]
I 03:19:05.836 locality_id_old=[-1] locality_id_new=[28]
I 03:19:05.879 locality_name_old=[] locality_name_new=[XYZ]
I 03:19:05.879 command=[ADD]
E 03:19:06.207 File on GCS has changed while reading.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~myappdomain/1.377368272328585247/main_v3.py", line 15259, in post
gcs_file = gcs.open (index_filename, mode='r')
File "/base/data/home/apps/s~myappdomain/1.377368272328585247/cloudstorage/cloudstorage_api.py", line 94, in open
buffer_size=read_buffer_size)
File "/base/data/home/apps/s~myappdomain/1.377368272328585247/cloudstorage/storage_api.py", line 220, in __init__
check_response_closure()
File "/base/data/home/apps/s~myappdomain/1.377368272328585247/cloudstorage/storage_api.py", line 448, in _checker
self._check_etag(resp_headers.get('etag'))
File "/base/data/home/apps/s~myappdomain/1.377368272328585247/cloudstorage/storage_api.py", line 476, in _check_etag
raise ValueError('File on GCS has changed while reading.')
ValueError: File on GCS has changed while reading.
I 03:19:06.235 Saved; key: __appstats__:045800, part: 144 bytes, full: 74513 bytes, overhead: 0.002 + 0.004; link: http://www.myappdomain.com/_ah/stats/details?time=1405806545812
I suspect that multiple triggered tasks try to open and update the file ("X") at the same time. And that causes the above exception. Please suggest a way to lock access to that file so that only one task is able to modify it at a time(similar to a transaction).
Appreciate your help and guidance.
UPDATE
Another way to prevent the above problem could be to modify one of the following queue.yaml parameter for the queue:
bucket_size
OR
max_concurrent_requests
But, not sure which one to modify.
Upvotes: 1
Views: 2172
Reputation: 15375
It is also possible to rely on GCS itself using preconditions
This allows updating a file only. See the docs:
Preconditions are often used in mutating requests — uploads, deletes, copies, or metadata updates — to prevent race conditions. Race conditions can arise when the same request is sent repeatedly or when independent processes interfere with each other. For example, multiple request retries after a network interruption, or users performing a read-modify-write operation on the same object can create race conditions.
Upvotes: 0
Reputation: 9116
A task queue of max_concurrent_requests = 1 should ensure that only one edit is made at a time to a file.
If you want to prevent too many tasks from running at once or to prevent datastore contention, you use max_concurrent_requests.
max_concurrent_requests (push queues only) Sets the maximum number of tasks that can be executed at any given time in the specified queue. The value is an integer. By default, this directive is unset and there is no limit on the maximum number of concurrent tasks. One use of this directive is to prevent too many tasks from running at once or to prevent datastore contention.
Restricting the maximum number of concurrent tasks gives you more control over your queue's rate of execution. For example, you can constrain the number of instances that are running the queue's tasks. Limiting the number of concurrent requests in a given queue allows you to make resources available for other queues or online processing.
Of course, you should build in logic that'll allow failed tasks to re-try etc, or you may end up with worse problems then you have now.
Upvotes: 2