LA_
LA_

Reputation: 20409

How to count results from many GAE tasks?

I run many-many tasks to get some information and process it. After each task run, I have an integer, which indicates how many portions of the information I've got. I would like to get sum of these integers received from different tasks.

Currently I use memcache to store sum:

def update_memcache_value(what, val, how_long=86400):
    value_old = get_memcache_value(what)
    memcache.set('system_'+what, value_old+val, how_long)

def get_memcache_value(what):
    value = memcache.get('system_'+what)
    if not value:
        value = 0
    return int(value)

update_memcache_value is called within each task (quite more often than once). But looks like the data there is often lost during the day. I can use NDB to store the same data, but it will require a lot of write ops. Is there any better way to store the same data (counter)?

Upvotes: 0

Views: 59

Answers (2)

Nicholas Franceschina
Nicholas Franceschina

Reputation: 6147

It sounds like you are specifically looking to have many tasks do a part of a sum and then have those all reduce down to one number at the end... so you want to use MapReduce. Or you could just use Pipelines, as MapReduce is actually built on top of it. If you're worried about write-ops, then you aren't going to be able to take advantage of App Engine's parallelism

Google I/O 2010 - Data pipelines with Google App Engine

https://www.youtube.com/watch?v=zSDC_TU7rtc

Pipelines Library

https://github.com/GoogleCloudPlatform/appengine-pipelines/wiki

MapReduce

https://cloud.google.com/appengine/docs/python/dataprocessing/

Upvotes: 2

jirungaray
jirungaray

Reputation: 1674

Unfortunately if your tasks span throughout the day memcache is not an option.

If you want to reduce the write ops you could set a second counter and backup the value on memcache every 100 tasks or whatever works for you.

if you are expecting to do this with using write ops for every task you could try backing up those results in a 3rd party storage like for example a Google Spreadsheet through the Spreasheets API but it seems like an overkill just to save some write ops (and not as performant, which is guess is not an issue).

Upvotes: 0

Related Questions