ASH
ASH

Reputation: 20342

Google Cloud Function Throwing Weird Error

Is anyone here familiar with Google Cloud Functions? I read their documentation and based on that, I customized my script to try to work in their hosted environment.

https://cloud.google.com/functions/docs/concepts/python-runtime

So, my Python script looks like this.

def main():

    requests
    numpy
    pandas
    datetime
    requests
    pandas_gbq
    xml.etree.ElementTree


    # authentication: working....
    login = 'my_email' 
    password = 'my_password'


    AsOfDate = datetime.datetime.today().strftime('%m-%d-%Y')

    #step into URL
    REQUEST_URL = 'https://www.business.com/report-api/device=779142&rdate=Yesterday'
    response = requests.get(REQUEST_URL, auth=(login, password))
    xml_data = response.text.encode('utf-8', 'ignore') 

    #tree = etree.parse(xml_data)
    root = xml.etree.ElementTree.fromstring(xml_data)

    # start collecting root elements and headers for data frame 1
    desc = root.get("Description")
    frm = root.get("From")
    thru = root.get("Thru")
    loc = root.get("locations")
    loc = loc[:-1]
    df1 = pandas.DataFrame([['From:',frm],['Through:',thru],['Location:',loc]])
    df1.columns = ['S','Analytics']
    #print(df1)

    # start getting the analytics for data frame 2
    data=[['Goal:',root[0][0].text],['Actual:',root[0][1].text],['Compliant:',root[0][2].text],['Errors:',root[0][3].text],['Checks:',root[0][4].text]]
    df2 = pandas.DataFrame(data)
    df2.columns = ['S','Analytics']
    #print(df2)

    # merge data frame 1 with data frame 2
    df3 = df1.append(df2, ignore_index=True)
    #print(df3)

    # append description and today's date onto data frame
    df3['Description'] = desc
    df3['AsOfDate'] = AsOfDate


    # push from data frame, where data has been transformed, into Google BQ
    pandas_gbq.to_gbq(df3, 'Metrics', 'analytics', chunksize=None, reauth=False, if_exists='append', private_key=None, auth_local_webserver=False, table_schema=None, location=None, progress_bar=True, verbose=None)
    print('Execute Query, Done!!')

main()

if __name__ == '__main__':
    main()  

Also, my requirements.txt looks like this.

requests
numpy
pandas
datetime
requests
pandas_gbq
xml.etree.ElementTree

My script has been working fine for the past 2+ months, but I need to run it on my laptop each day. To get away from this manual process, I am trying to get this running on the cloud. The problem is that I keep getting an erorr message that reads: TypeError: main() takes 0 positional arguments but 1 was given

To me, it looks like no arguments are given and no arguments are expected, but somehow Google is saying 1 argument is given. Can I modify my code slightly to get this to work, or somehow bypass this seemingly benign error? Thanks.

Upvotes: 2

Views: 1971

Answers (2)

John Hanley
John Hanley

Reputation: 81454

The following takes your code and changes it to run in Google Cloud Functions using an HTTP trigger. You can then use Google Cloud Scheduler to call your function on schedule. You will also need to create a requirements.txt with the modules that you need to import. See this document for more information.

def handler(request):

    import requests
    import numpy
    import pandas
    import datetime
    import requests
    import pandas_gbq
    import xml.etree.ElementTree


    # authentication: working....
    login = 'my_email' 
    password = 'my_password'


    AsOfDate = datetime.datetime.today().strftime('%m-%d-%Y')

    #step into URL
    REQUEST_URL = 'https://www.business.com/report-api/device=779142&rdate=Yesterday'
    response = requests.get(REQUEST_URL, auth=(login, password))
    xml_data = response.text.encode('utf-8', 'ignore') 

    #tree = etree.parse(xml_data)
    root = xml.etree.ElementTree.fromstring(xml_data)

    # start collecting root elements and headers for data frame 1
    desc = root.get("Description")
    frm = root.get("From")
    thru = root.get("Thru")
    loc = root.get("locations")
    loc = loc[:-1]
    df1 = pandas.DataFrame([['From:',frm],['Through:',thru],['Location:',loc]])
    df1.columns = ['S','Analytics']
    #print(df1)

    # start getting the analytics for data frame 2
    data=[['Goal:',root[0][0].text],['Actual:',root[0][1].text],['Compliant:',root[0][2].text],['Errors:',root[0][3].text],['Checks:',root[0][4].text]]
    df2 = pandas.DataFrame(data)
    df2.columns = ['S','Analytics']
    #print(df2)

    # merge data frame 1 with data frame 2
    df3 = df1.append(df2, ignore_index=True)
    #print(df3)

    # append description and today's date onto data frame
    df3['Description'] = desc
    df3['AsOfDate'] = AsOfDate


    # push from data frame, where data has been transformed, into Google BQ
    pandas_gbq.to_gbq(df3, 'Metrics', 'analytics', chunksize=None, reauth=False, if_exists='append', private_key=None, auth_local_webserver=False, table_schema=None, location=None, progress_bar=True, verbose=None)
    # print('Execute Query, Done!!')

    # Normally for an HTTP trigger you would return a full HTML page here
    # <html><head></head><body>you get the idea</body></html>
    return 'Execute Query, Done!!'

Upvotes: 2

Doug Stevenson
Doug Stevenson

Reputation: 317808

You're misunderstanding how Cloud Functions works. It doesn't let you simply run arbitrary scripts. You write triggers that respond to HTTP requests, or when something changes in your Cloud project. That doesn't seem to be what you're doing here. Cloud Functions deployments don't use main().

You might want to read the overview documentation to get an understanding of what Cloud Functions is used for.

If you're trying to run something periodically, consider writing an HTTP trigger and have that invoked by some cron-like service at the rate you want.

Upvotes: 1

Related Questions