cron job throwing DeadlineExceededError

I am currently working on a google cloud project in free trial mode. I have cron job to fetch the data from a data vendor and store it in the data store. I wrote the code to fetch the data couple of weeks ago and it was all working fine but all of sudden , i started receiving error "DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded" for last two days. I believe cron job is supposed to timeout only after 60 minutes any idea why i am getting the error?.

cron task

def run():
  try:
    config = cron.config
    actual_data_source = config['xxx']['xxxx']
    original_data_source = actual_data_source

    company_list = cron.rest_client.load(config, "companies", '')

    if not company_list:
        logging.info("Company list is empty")
        return "Ok"

    for row in company_list:
        company_repository.save(row,original_data_source, actual_data_source)

    return "OK"

Repository code

  def save( dto, org_ds , act_dp):
  try:
    key = 'FIN/%s' % (dto['ticker'])
    company = CompanyInfo(id=key)
    company.stock_code = key
    company.ticker = dto['ticker']
    company.name = dto['name']
    company.original_data_source = org_ds
    company.actual_data_provider = act_dp
    company.put()
    return company
     except Exception:
    logging.exception("company_repository: error occurred saving the company 
    record ")
    raise

RestClient

  def load(config, resource, filter):
    try:
    username = config['xxxx']['xxxx']
    password = config['xxxx']['xxxx']
    headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":" 
    + password)}

    if filter:
        from_date = filter['from']
        to_date = filter['to']
        ticker = filter['ticker']
        start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
        end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")

    current_page = 1
    data = []

    while True:

      if (filter):
        url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
      else:
        url = config['xxxx']["endpoints"][resource] % (current_page)

      response = urlfetch.fetch(
            url=url,
            deadline=60,
            method=urlfetch.GET,
            headers=headers,
            follow_redirects=False,

        )
      if response.status_code != 200:
            logging.error("xxxx GET received status code %d!" % (response.status_code))
            logging.error("error happend for url: %s with headers %s", url, headers)
            return 'Sorry, xxxx API request failed', 500

      db = json.loads(response.content)

      if not db['data']:
            break

      data.extend(db['data'])

      if db['total_pages'] == current_page:
            break

      current_page += 1

    return data
except Exception:
     logging.exception("Error occured with xxxx API request")
     raise

Upvotes: 0

Answers (2)

Alex

Reputation: 5276

I'm guessing this is the same question as this, but now with more code: DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded

I modified your code to write to the database after each urlfetch. If there are more pages, then it relaunches itself in a deferred task, which should be well before the 10 minute timeout.

Uncaught exceptions in a deferred task cause it to retry, so be mindful of that.

It was unclear to me how actual_data_source & original_data_source worked, but I think you should be able to modify that part.

crontask

def run(current_page=0):
  try:
    config = cron.config
    actual_data_source = config['xxx']['xxxx']
    original_data_source = actual_data_source

    data, more = cron.rest_client.load(config, "companies", '', current_page)

    for row in data:
          company_repository.save(row, original_data_source, actual_data_source)

    # fetch the rest
    if more:
        deferred.defer(run, current_page + 1)
  except Exception as e:
     logging.exception("run() experienced an error: %s" % e)

RestClient

  def load(config, resource, filter, current_page):
    try:
        username = config['xxxx']['xxxx']
        password = config['xxxx']['xxxx']
        headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":" 
        + password)}

        if filter:
            from_date = filter['from']
            to_date = filter['to']
            ticker = filter['ticker']
            start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
            end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")

            url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
        else:
            url = config['xxxx']["endpoints"][resource] % (current_page)

        response = urlfetch.fetch(
                url=url,
                deadline=60,
                method=urlfetch.GET,
                headers=headers,
                follow_redirects=False,

        )
        if response.status_code != 200:
                logging.error("xxxx GET received status code %d!" % (response.status_code))
                logging.error("error happend for url: %s with headers %s", url, headers)
                return [], False

        db = json.loads(response.content)

        return db['data'], (db['total_pages'] != current_page)


    except Exception as e:
         logging.exception("Error occured with xxxx API request: %s" % e)
         return [], False

Upvotes: 1

Momus

Reputation: 394

I would prefer to write this as a comment, but I need more reputation to do that.

What happens when you run the actual data fetch directly instead of through the cron job?
Have you tried measuring a time delta from the start to the end of the job?
Has the number of companies being retrieved increased dramatically?
You appear to be doing some form of stock quote aggregation - is it possible that the provider has started blocking you?

Upvotes: 0

cron job throwing DeadlineExceededError

Answers (2)

Related Questions