Reputation: 786
So I made a cron job to update my news aggregator app with new stories every 1 minute. I should say I'm a complete novice to cron jobs, with very limited experience running GAE.
This is my folder structure:
This is what's in news.py
:
feed = ['https://news.google.co.in/news/section?cf=all&pz=1&ned=in&topic=e&ict=ln&output=rss&num=10']
feedlist = []
def render_str(template, **params):
t = jinja_env.get_template(template)
return t.render(params)
class CronTask(webapp2.RequestHandler):
def get(self):
self.redirect('/entertainment')
class MainPage(webapp2.RequestHandler):
def get(self):
self.response.write(render_str('mainpage.html'))
class Entertainment(webapp2.RequestHandler):
def get(self):
rssfeed = feedparser.parse(feed)
for news in rssfeed.entries:
new_entry = {'title': news.title, 'url': news.link, 'publisheddate': news.published}
feedlist.append(new_entry)
self.redirect('/1/display')
class Display(webapp2.RequestHandler):
def get(self, page_no):
is_this_last = False
list_to_be_displayed_here = feedlist[(int(page_no)-1)*5:int(page_no)*5]
try:
is_last = feedlist[int(page_no)*5]
except:
is_this_last = True
self.response.write(render_str('/display.html', page_no=page_no, feedlist=list_to_be_displayed_here, is_this_last=is_this_last))
app = webapp2.WSGIApplication([('/', MainPage),
('/entertainment', Entertainment),
('/([0-9]+)/display', Display),
('/crontask', CronTask)
], debug = True)
I assume this is how cron.yaml
is supposed to be set up:
cron:
- description: periodic update of news
url: /crontask
target: beta
schedule: every 1 minute
This is app.yaml:
application: encoded-alpha-139800
version: 1
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /static
static_dir: static
- url: /crontask
script: news.py
- url: /.*
script: news.app
libraries:
- name: jinja2
version: latest
display.html
just displays the feeds' info and since I didn't know how to implement the cursor()
method, I implemented the rudimentary pagination that you see in get()
of Display
, with me slicing feedlist
.
When I run news.py
, I get this traceback:
File "C:\Program Files (x86)\Google\google_appengine\dev_appserver.py", line 83, in <module>
_run_file(__file__, globals())
File "C:\Program Files (x86)\Google\google_appengine\dev_appserver.py", line 79, in _run_file
execfile(_PATHS.script_file(script_name), globals_)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 1040, in <module>
main()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 1033, in main
dev_server.start(options)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 758, in start
options.config_paths, options.app_id)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\application_configuration.py", line 831, in __init__
module_configuration = ModuleConfiguration(config_path, app_id)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\application_configuration.py", line 127, in __init__
self._config_path)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\application_configuration.py", line 424, in _parse_configuration
config, files = appinfo_includes.ParseAndReturnIncludePaths(f)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\appinfo_includes.py", line 82, in ParseAndReturnIncludePaths
appyaml = appinfo.LoadSingleAppInfo(appinfo_file)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\appinfo.py", line 2190, in LoadSingleAppInfo
listener.Parse(app_info)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\yaml_listener.py", line 227, in Parse
self._HandleEvents(self._GenerateEventParameters(stream, loader_class))
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\yaml_listener.py", line 178, in _HandleEvents
raise yaml_errors.EventError(e, event_object)
google.appengine.api.yaml_errors.EventError: threadsafe cannot be enabled with CGI handler: news.py
in "C:\Users\IBM_ADMIN\Downloads\7c\NewsAggregatorGAE\app.yaml", line 19, column 18
Is it because I'm trying to run almost the entire application through a cron job? Or is there something wrong with my settings, or my entire setup?
Upvotes: 0
Views: 414
Reputation: 39814
The problem is that you indicated the handler to be a CGI app in your app.yaml
file:
script: news.py
From Request handlers:
When App Engine receives a web request for your application, it calls the handler script that corresponds to the URL, as described in the application's app.yaml configuration file . The Python 2.7 runtime supports the WSGI standard and the CGI standard for backwards compatibility. WSGI is preferred, and some features of Python 2.7 do not work without it. The configuration of your application's script handlers determines whether a request is handled using WSGI or CGI.
...
If you mark your application as thread-safe, concurrent requests will be enabled, which means that App Engine can dispatch multiple requests to each web server in parallel. To do so, set threadsafe: true in app.yaml. Concurrent requests are not available if any script handler uses CGI.
Just make it a WSGI app and this error should go away:
script: news.app
Remember that the GAE cron service is nothing but a generator for GET requests to the configured URLs according to the configured schedule. From Scheduling Tasks With Cron for Python:
The App Engine Cron Service allows you to configure regularly scheduled tasks that operate at defined times or regular intervals. These tasks are commonly known as cron jobs. These cron jobs are automatically triggered by the App Engine Cron Service.
...
A cron job invokes a URL, using an HTTP GET request, at a given time of day. An cron job request is subject to the same limits as those for push task queues.
How your app executes the cron jobs really boils down to how it handles those requests.
Upvotes: 1