anthony
anthony

Reputation: 7

Google Cloud Function can't be invoked

I have a cloud function, the code is fine when I test locally. However, it doesn't work as a cloud function even though it deploys successfully. When deployed, I tried adding allUsers as a Cloud Function invoker. Ingress settings are set to allow all web traffic.

I get a 500 error and it says >Error: could not handle the request when visiting the URL.

Cloud Scheduler constantly fails, and the logs for the cloud function don't really help give any understanding as to why it fails.

When expanded, the logs give no further detail either.

I've got no idea what else to try and resolve this issue. I just want to be able to invoke my HTTP cloud function on a schedule, the code works fine when run and tested using a service account. Why doesn't it work when added to the function?

Here is the code I'm using;

from bs4 import BeautifulSoup
import pandas as pd
import constants as const
from google.cloud import storage
import os
import json
from datetime import datetime
from google.cloud import bigquery
import re
from flask import escape

#service_account_path = os.path.join("/Users/nbamodel/nba-data-keys.json")
#os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = service_account_path

client = storage.Client()
bucket = client.get_bucket(const.destination_gcs_bucket)

def scrape_team_data(request):
    """HTTP Cloud Function.
    Args:
        request (flask.Request): The request object.
        <http://flask.pocoo.org/docs/1.0/api/#flask.Request>
    Returns:
        The response text, or any set of values that can be turned into a
        Response object using `make_response`
        <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>.
    """

    headers = [
        'Rank',
        'Team',
        'Age',
        'Wins',
        'Losses',
        'PW',
        'PL',
        'MOV',
        'SOS',
        'SRS',
        'ORtg',
        'DRtg',
        'NRtg',
        'Pace',
        'FTr',
        '_3PAr',
        'TS_pct',
        'offense_eFG_pct',
        'offense_TOV_pct',
        'offense_ORB_pct',
        'offense_FT_FGA',
        'defense_eFG_pct',
        'defense_TOV_pct',
        'defense_DRB_pct',
        'defense_FT_FGA',
        'Arena',
        'Attendance',
        'Attendance_Game'
        ]

    r = requests.get('https://www.basketball-reference.com/leagues/NBA_2020.html')
    matches = re.findall(r'id=\"misc_stats\".+?(?=table>)table>', r.text, re.DOTALL)
    find_table = pd.read_html('<table ' + matches[0])
    df = find_table[0]
    df.columns = headers
    filename = 'teams_data_adv_stats' #+ datetime.now().strftime("%Y%m%d")
    df.to_json(filename, orient='records', lines=True)

    print(filename)

    # Push data to GCS
    blob = bucket.blob(filename)

    blob.upload_from_filename(
        filename=filename,
        content_type='application/json'
    )

    # Create BQ table from data in bucket
    client = bigquery.Client()
    dataset_id = 'nba_model'

    dataset_ref = client.dataset(dataset_id)
    job_config = bigquery.LoadJobConfig()
    job_config.create_disposition = 'CREATE_IF_NEEDED'
    job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
    uri = "gs://nba_teams_data/{}".format(filename)

    load_job = client.load_table_from_uri(
        uri,
        dataset_ref.table("teams_data"),
        location="US",  # Location must match that of the destination dataset.
        job_config=job_config,
    )  # API request
    print("Starting job {}".format(load_job.job_id))

    load_job.result()  # Waits for table load to complete.
    print("Job finished.")

    destination_table = client.get_table(dataset_ref.table("teams_data"))
    print("Loaded {} rows.".format(destination_table.num_rows))

    return

Upvotes: 0

Views: 1037

Answers (1)

Ajordat
Ajordat

Reputation: 1392

I have deployed your code into a Cloud Function and it's failing due to two reasons.

First, it's missing the requests dependency, so the line import requests has to be added on top of the file, with the other imports.

Second, it seems like your code is trying to write a file on a read-only file system, which is immediately rejected the os and the function gets terminated. Said write operation is being done by the method DataFrame.to_json, which is trying to write content to the file teams_data_adv_stats to later upload it to a GCS bucket.

There are two ways that you can work around this issue:

  1. Create the file in the temporary folder. As explained on the documentation you cannot write in the file system with the exception of the /tmp directory. I have managed to succeed using this method with the following modified lines:

     filename = 'teams_data_adv_stats'
     path = os.path.join('/tmp', filename)
     df.to_json(path, orient='records', lines=True)
     blob = bucket.blob(filename)
    
     blob.upload_from_filename(
         filename=path,
         content_type='application/json'
     )
    
  2. Avoid creating a file and work with a string. Instead of using upload_from_filename I suggest you work with upload_from_string. I have managed to succeed using this method with the following modified lines:

     filename = 'teams_data_adv_stats'
     data_json = df.to_json(orient='records', lines=True)
     blob = bucket.blob(filename)
    
     blob.upload_from_string(
         data_json,
         content_type='application/json'
     )
    

As a heads up, you can test your Cloud Functions from testing tab on the function's details. I recommend you use it because it's what I have worked with in order to troubleshoot your issue and could be handy to know about it. Also bear in mind that there's an on-going issue with logs on failing Cloud Functions with the python37 runtime that prevents the error message to appear. I encountered the issue while working on your CF and I worked around it with the workaround provided.

As a side note I did all the reproduction with the following requirements.txt file in order to deploy and run successfully, since you didn't provide it. I assume this is correct:

beautifulsoup4==4.9.1
Flask==1.1.2
google-cloud-bigquery==1.27.2
google-cloud-storage==1.30.0
lxml==4.5.2
pandas==1.1.1

Upvotes: 1

Related Questions