Sidharth Samant
Sidharth Samant

Reputation: 786

Trouble with running cron job in GAE python

So I made a cron job to update my news aggregator app with new stories every 1 minute. I should say I'm a complete novice to cron jobs, with very limited experience running GAE.

This is my folder structure:

This is what's in news.py:

feed = ['https://news.google.co.in/news/section?cf=all&pz=1&ned=in&topic=e&ict=ln&output=rss&num=10']

feedlist = []

def render_str(template, **params):
    t = jinja_env.get_template(template)
    return t.render(params)

class CronTask(webapp2.RequestHandler):
    def get(self):
        self.redirect('/entertainment')

class MainPage(webapp2.RequestHandler):
    def get(self):
        self.response.write(render_str('mainpage.html'))

class Entertainment(webapp2.RequestHandler):
    def get(self):
        rssfeed = feedparser.parse(feed)
        for news in rssfeed.entries:
            new_entry = {'title': news.title, 'url': news.link, 'publisheddate': news.published}
            feedlist.append(new_entry)
        self.redirect('/1/display')

class Display(webapp2.RequestHandler):
    def get(self, page_no):
        is_this_last = False
        list_to_be_displayed_here = feedlist[(int(page_no)-1)*5:int(page_no)*5]
        try:
            is_last = feedlist[int(page_no)*5]
        except:
            is_this_last = True
        self.response.write(render_str('/display.html', page_no=page_no, feedlist=list_to_be_displayed_here, is_this_last=is_this_last))

app = webapp2.WSGIApplication([('/', MainPage),
                           ('/entertainment', Entertainment),
                           ('/([0-9]+)/display', Display),
                           ('/crontask', CronTask)
                            ], debug = True)

I assume this is how cron.yaml is supposed to be set up:

cron:
- description: periodic update of news
  url: /crontask
  target: beta
  schedule: every 1 minute

This is app.yaml:

application: encoded-alpha-139800
version: 1
runtime: python27
api_version: 1 
threadsafe: true

handlers:
- url: /static
  static_dir: static

- url: /crontask
  script: news.py

- url: /.*
  script: news.app

libraries:
- name: jinja2
  version: latest

display.html just displays the feeds' info and since I didn't know how to implement the cursor() method, I implemented the rudimentary pagination that you see in get() of Display, with me slicing feedlist.

When I run news.py, I get this traceback:

  File "C:\Program Files (x86)\Google\google_appengine\dev_appserver.py", line 83, in <module>
_run_file(__file__, globals())
  File "C:\Program Files (x86)\Google\google_appengine\dev_appserver.py", line 79, in _run_file
execfile(_PATHS.script_file(script_name), globals_)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 1040, in <module>
main()
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 1033, in main
dev_server.start(options)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 758, in start
options.config_paths, options.app_id)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\application_configuration.py", line 831, in __init__
module_configuration = ModuleConfiguration(config_path, app_id)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\application_configuration.py", line 127, in __init__
self._config_path)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\tools\devappserver2\application_configuration.py", line 424, in _parse_configuration
config, files = appinfo_includes.ParseAndReturnIncludePaths(f)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\appinfo_includes.py", line 82, in ParseAndReturnIncludePaths
appyaml = appinfo.LoadSingleAppInfo(appinfo_file)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\appinfo.py", line 2190, in LoadSingleAppInfo

listener.Parse(app_info)
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\yaml_listener.py", line 227, in Parse
self._HandleEvents(self._GenerateEventParameters(stream, loader_class))
  File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\yaml_listener.py", line 178, in _HandleEvents
raise yaml_errors.EventError(e, event_object)
google.appengine.api.yaml_errors.EventError: threadsafe cannot be enabled with CGI handler: news.py
  in "C:\Users\IBM_ADMIN\Downloads\7c\NewsAggregatorGAE\app.yaml", line 19, column 18

Is it because I'm trying to run almost the entire application through a cron job? Or is there something wrong with my settings, or my entire setup?

Upvotes: 0

Views: 414

Answers (1)

Dan Cornilescu
Dan Cornilescu

Reputation: 39814

The problem is that you indicated the handler to be a CGI app in your app.yaml file:

script: news.py

From Request handlers:

When App Engine receives a web request for your application, it calls the handler script that corresponds to the URL, as described in the application's app.yaml configuration file . The Python 2.7 runtime supports the WSGI standard and the CGI standard for backwards compatibility. WSGI is preferred, and some features of Python 2.7 do not work without it. The configuration of your application's script handlers determines whether a request is handled using WSGI or CGI.

...

If you mark your application as thread-safe, concurrent requests will be enabled, which means that App Engine can dispatch multiple requests to each web server in parallel. To do so, set threadsafe: true in app.yaml. Concurrent requests are not available if any script handler uses CGI.

Just make it a WSGI app and this error should go away:

script: news.app

Remember that the GAE cron service is nothing but a generator for GET requests to the configured URLs according to the configured schedule. From Scheduling Tasks With Cron for Python:

The App Engine Cron Service allows you to configure regularly scheduled tasks that operate at defined times or regular intervals. These tasks are commonly known as cron jobs. These cron jobs are automatically triggered by the App Engine Cron Service.

...

A cron job invokes a URL, using an HTTP GET request, at a given time of day. An cron job request is subject to the same limits as those for push task queues.

How your app executes the cron jobs really boils down to how it handles those requests.

Upvotes: 1

Related Questions