Reputation: 125
How can I join two tables in a select statement in which I also use a UDF? I stored the SQL query and UDF function in two files that I call via the bq command line. However, when I run it, I get the following error:
BigQuery error in query operation: Error processing job '[projectID]:bqjob_[error_number]': Table name cannot be resolved: dataset name is missing.
Note that I'm logged in the correct project via the gcloud auth method. My SQL statement:
SELECT
substr(date,1,6) as date,
device,
channelGroup,
COUNT(DISTINCT CONCAT(fullVisitorId,cast(visitId as string))) AS sessions,
COUNT(DISTINCT fullVisitorId) AS users,
FROM
defaultChannelGroup(
SELECT
a.date,
a.device.deviceCategory AS device,
b.hits.page.pagePath AS page,
a.fullVisitorId,
a.visitId,
a.trafficSource.source AS trafficSourceSource,
a.trafficSource.medium AS trafficSourceMedium,
a.trafficSource.campaign AS trafficSourceCampaign
FROM FLATTEN(
SELECT date,device.deviceCategory,trafficSource.source,trafficSource.medium,trafficSource.campaign,fullVisitorId,visitID
FROM
TABLE_DATE_RANGE([datasetname.ga_sessions_],TIMESTAMP('2016-10-01'),TIMESTAMP('2016-10-31'))
,hits) as a
LEFT JOIN FLATTEN(
SELECT hits.page.pagePath,hits.time,visitID,fullVisitorId
FROM
TABLE_DATE_RANGE([datasetname.ga_sessions_],TIMESTAMP('2016-10-01'),TIMESTAMP('2016-10-31'))
WHERE
hits.time = 0
and trafficSource.medium = 'organic'
,hits) as b
ON a.fullVisitorId = b.fullVisitorId AND a.visitID = b.visitID
)
GROUP BY
date,
device,
channelGroup
ORDER BY sessions DESC
where I replaced my datasetname with the correct name of course; and some of the UDF (which works with another query):
function defaultChannelGroup(row, emit)
{
function output(channelGroup) {
emit({channelGroup:channelGroup,
fullVisitorId: row.fullVisitorId,
visitId: row.visitId,
device: row.device,
date: row.date
});
}
computeDefaultChannelGroup(row, output);
}
bigquery.defineFunction(
'defaultChannelGroup',
['date', 'device', 'page', 'trafficSourceMedium', 'trafficSourceSource', 'trafficSourceCampaign', 'fullVisitorId', 'visitId'],
//['device', 'page', 'trafficSourceMedium', 'trafficSourceSource', 'trafficSourceCampaign', 'fullVisitorId', 'visitId'],
[{'name': 'channelGroup', 'type': 'string'},
{'name': 'fullVisitorId', 'type': 'string'},
{'name': 'visitId', 'type': 'integer'},
{'name': 'device', 'type': 'string'},
{'name': 'date', 'type': 'string'}
],
defaultChannelGroup
);
Upvotes: 2
Views: 851
Reputation: 125
The select statements within the FLATTEN function needed to be in brackets.
Ran the bq command in the shell:
bq query --udf_resource=udf.js "$(cat query.sql)"
query.sql contains the following scripts:
SELECT
substr(date,1,6) as date,
device,
channelGroup,
COUNT(DISTINCT CONCAT(fullVisitorId,cast(visitId as string))) AS sessions,
COUNT(DISTINCT fullVisitorId) AS users,
COUNT(DISTINCT transactionId) as orders,
CAST(SUM(transactionRevenue)/1000000 AS INTEGER) as sales
FROM
defaultChannelGroup(
SELECT
a.date as date,
a.device.deviceCategory AS device,
b.hits.page.pagePath AS page,
a.fullVisitorId as fullVisitorId,
a.visitId as visitId,
a.trafficSource.source AS trafficSourceSource,
a.trafficSource.medium AS trafficSourceMedium,
a.trafficSource.campaign AS trafficSourceCampaign,
a.hits.transaction.transactionRevenue as transactionRevenue,
a.hits.transaction.transactionID as transactionId
FROM FLATTEN((
SELECT date,device.deviceCategory,trafficSource.source,trafficSource.medium,trafficSource.campaign,fullVisitorId,visitID,
hits.transaction.transactionID, hits.transaction.transactionRevenue
FROM
TABLE_DATE_RANGE([datasetname.ga_sessions_],TIMESTAMP('2016-10-01'),TIMESTAMP('2016-10-31'))
),hits) as a
LEFT JOIN FLATTEN((
SELECT hits.page.pagePath,hits.time,trafficSource.medium,visitID,fullVisitorId
FROM
TABLE_DATE_RANGE([datasetname.ga_sessions_],TIMESTAMP('2016-10-01'),TIMESTAMP('2016-10-31'))
WHERE
hits.time = 0
and trafficSource.medium = 'organic'
),hits) as b
ON a.fullVisitorId = b.fullVisitorId AND a.visitID = b.visitID
)
GROUP BY
date,
device,
channelGroup
ORDER BY sessions DESC
and udf.js contains the following function (the 'computeDefaultChannelGroup' function is not included):
function defaultChannelGroup(row, emit)
{
function output(channelGroup) {
emit({channelGroup:channelGroup,
date: row.date,
fullVisitorId: row.fullVisitorId,
visitId: row.visitId,
device: row.device,
transactionId: row.transactionId,
transactionRevenue: row.transactionRevenue,
});
}
computeDefaultChannelGroup(row, output);
}
bigquery.defineFunction(
'defaultChannelGroup',
['date', 'device', 'page', 'trafficSourceMedium', 'trafficSourceSource', 'trafficSourceCampaign', 'fullVisitorId', 'visitId', 'transactionId', 'transactionRevenue'],
[{'name': 'channelGroup', 'type': 'string'},
{'name': 'date', 'type': 'string'},
{'name': 'fullVisitorId', 'type': 'string'},
{'name': 'visitId', 'type': 'integer'},
{'name': 'device', 'type': 'string'},
{'name': 'transactionId', 'type': 'string'},
{'name': 'transactionRevenue', 'type': 'integer'}
],
defaultChannelGroup
);
Ran without error and matched the data in Google Analytics.
Upvotes: 1