LA_
LA_

Reputation: 20409

How to organize multiple url fetch calls with GAE?

I should perform thousands of URL fetch calls during the day. All calls are the same, just parameters changes - way and date.

Currently I use multiple cron entries to execute such calls:

- description: get data
  url: /admin/getdata?d=way1,way2,way3,way4,...,way12
  schedule: every day 8:30

- description: get data
  url: /admin/getdata?d=way13,way14,way15,way16,...,way24
  schedule: every day 8:40

...

- description: get data
  url: /admin/getdata?d=way99,way100,way101,way102,...,way123
  schedule: every day 9:20

Then in my getdata handler I parse the d parameter received and perform multiple urlfetches:

for date_ in dates:
    for way in d:
        response = urlfetch.Fetch('http://example.com?way='+way+'&date='+date_, deadline=60, headers=headers, follow_redirects=True) 

But it doesn't help me a lot - still 60 seconds given for the cron job is not enough.

I was thinking about running cron job each ten minutes, but I should store somewhere possible ways and dates, mark already executed requests, then reset it (to be able to execute all again next day).

Is there any better way to do the same?

Upvotes: 0

Views: 316

Answers (2)

GAEfan
GAEfan

Reputation: 11360

Or, one cron job which spawns taskqueued jobs for all the other urls. That can be done in the default module, for free. I would set a countdown parameter, to space them out, to not spawn up too many instances. Simplifies app.yaml as well.

Upvotes: 1

Andrei Volgin
Andrei Volgin

Reputation: 41089

A better way is to have just one cron job per day that fetches all urls. All you need to do is to target this cron-job at a backend instance, which does not have a time limit.

Use Modules to create such an instance, and add a "target" setting to your cron job.

Upvotes: 1

Related Questions