Reputation: 125
Against a BigQuery table, I'm trying to run a SQL statement calling a UDF. This statement is executed within a Python script and the call is made via the BigQuery API.
When I execute a simple SQL statement without a UDF, it works fine. However, I keep getting the same error when I try to use a UDF script (stored either locally or in a GCS bucket). This what I get on my local Terminal (I run the script via Python Launcher):
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/googleapiclient/http.py", line 840, in execute raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/[projectId]/queries?alt=json returned "Required parameter is missing">
And this is my Python script:
credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/bigquery')
aservice = build('bigquery','v2',credentials=credentials)
query_requestb = aservice.jobs()
query_data = {
'configuration': {
'query': {
'userDefinedFunctionResources': [
{
'resourceUri': 'gs://[bucketName]/[fileName].js'
}
],
'query': sql
}
},
'timeoutMs': 100000
}
query_response = query_requestb.query(projectId=PROJECT_NUMBER,body=query_data).execute(num_retries=0)
Any idea what 'parameter is missing' or how I can get this to run?
Upvotes: 0
Views: 2843
Reputation: 125
The query I wanted to run was to categorise traffic and sales by marketing channel which I usually use a UDF for. This is the query I ran using standard SQL
. This query is stored in a file which I read and store in the variable sql
:
CREATE TEMPORARY FUNCTION
mktchannels(source STRING,
medium STRING,
campaign STRING)
RETURNS STRING
LANGUAGE js AS """
return channelGrouping(source,medium,campaign) // where channelGrouping is the function in my channelgrouping.js file which contains the attribution rules
""" OPTIONS ( library=["gs://[bucket]/[path]/regex.js",
"gs://[bucket]/[path]/channelgrouping.js"] );
WITH
traffic AS ( // select fields from the BigQuery table
SELECT
device.deviceCategory AS device,
trafficSource.source AS source,
trafficSource.medium AS medium,
trafficSource.campaign AS campaign,
SUM(totals.visits) AS sessions,
SUM(totals.transactionRevenue)/1e6 as revenue,
SUM(totals.transactions) as transactions
FROM
`[datasetId].[table]`
GROUP BY
device,
source,
medium,
campaign)
SELECT
mktchannels(source,
medium,
campaign) AS channel, // call the temp function set above
device,
SUM(sessions) AS sessions,
SUM(transactions) as transactions,
ROUND(SUM(revenue),2) as revenue
FROM
traffic
GROUP BY
device,
channel
ORDER BY
channel,
device;
And then in the Python script:
fd = file('myquery.sql', 'r')
sql = fd.read()
fd.close()
query_data = {
'query': sql,
'maximumBillingTier': 10,
'useLegacySql': False,
'timeoutMs': 300000
}
Hope this helps anyone in the future!
Upvotes: 0
Reputation: 33765
Instead of specifying userDefinedFunctionResources
, use CREATE TEMP FUNCTION
in the body of your 'query'
with the library referenced as part of the OPTIONS
clause. You will need to use standard SQL for this, and you can also refer to the documentation on user-defined functions. Your query would look something like this:
#standardSQL
CREATE TEMP FUNCTION MyJsFunction(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS """
return my_js_function(x);
"""
OPTIONS (library='gs://[bucketName]/[fileName].js');
SELECT MyJsFunction(x)
FROM MyTable;
Upvotes: 4