Sebastian Küpers
Sebastian Küpers

Reputation: 241

Best practice to release memory after url fetch on appengine (python)

my problem is how to best release memory the response of an asynchrones url fetch needs on appengine. Here is what I basically do in python:

rpcs = []

for event in event_list:
    url = 'http://someurl.com'
    rpc = urlfetch.create_rpc()
    rpc.callback = create_callback(rpc)
    urlfetch.make_fetch_call(rpc, url)
    rpcs.append(rpc)

for rpc in rpcs:
    rpc.wait()

In my test scenario it does that for 1500 request. But I need an architecture to handle even much more within a short amount of time.

Then there is a callback function, which adds a task to a queue to process the results:

def event_callback(rpc):
    result = rpc.get_result()
    data = json.loads(result.content)
    taskqueue.add(queue_name='name', url='url', params={'data': data})

My problem is, that I do so many concurrent RPC calls, that the memory of my instance crashes: "Exceeded soft private memory limit with 159.234 MB after servicing 975 requests total"

I already tried three things:

del result
del data

and

result = None
data = None

and I ran the garbage collector manually after the callback function.

gc.collect()

But nothing seem to release the memory directly after a callback functions has added the task to a queue - and therefore the instance crashes. Is there any other way to do it?

Upvotes: 3

Views: 771

Answers (2)

tesdal
tesdal

Reputation: 2459

Use the task queue for urlfetch as well, fan out and avoid exhausting memory, register named tasks and provide the event_list cursor to next task. You might want to fetch+process in such a scenario instead of registering new task for every process, especially if process also includes datastore writes.

I also find ndb to make these async solutions more elegant.

Check out Brett Slatkins talk on scalable apps and perhaps pipelines.

Upvotes: 1

T. Steinrücken
T. Steinrücken

Reputation: 469

Wrong approach: Put these urls into a (put)-queue, increase its rate to the desired value (defaut: 5/sec), and let each task handle one url-fetch (or a group hereof). Please note that theres a safety limit of 3000 url-fetch-api-calls / minute (and one url-fetch might use more than one api-call)

Upvotes: 2

Related Questions