Reputation: 21
We love BigQuery it is amazing and we use it a lot but we have an issue with the Select statement, for some reason if you create a Select based on Variables you defined it does not work, but when you use the full code of the variables it does
The error I get is the following -
Error: (L5:47): Expression 'RequestsPerSession' is not present in the GROUP BY list
off course we do not want to aggregate based on this variable - there for it is no in the GROUP BY section.
This does not work
SELECT
ch,
COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(Requests/UniqueSessions,1) AS RequestsPerSession
This does work
SELECT
ch,
COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession
Any Ideas on how to fix it?
Also is there a way to create a variable within the select function for calculation purposes but not have it show up in the final results?
Upvotes: 1
Views: 125
Reputation: 172993
Any Ideas on how to fix it?
try below "workaround"
SELECT
ch,
COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(Requests/UniqueSessions,1) * MAX(1) AS RequestsPerSession
I realized that looks like engine needs some hint to understand that this particular field (RequestsPerSession) is not for grouping by but rather aggregation.
Below is how I "tested"
SELECT id,
COUNT(a) AS b,
SUM(a) AS c,
COUNT(a)/SUM(a) AS k1,
b/SUM(a) AS k2,
COUNT(a)/c AS k3,
MAX(1) * (b/c) AS k4
FROM
(SELECT 1 AS id, 1 AS a),
(SELECT 1 AS id, 2 AS a),
(SELECT 1 AS id, 3 AS a),
(SELECT 1 AS id, 4 AS a),
(SELECT 2 AS id, 1 AS a),
(SELECT 2 AS id, 2 AS a),
(SELECT 2 AS id, 3 AS a),
GROUP BY id
Upvotes: 0
Reputation: 172993
Any Ideas on how to fix it?
The GROUP BY clause allows you to group rows that have the same values for a given field or set of fields so that you can compute aggregations of related fields. Thus, in SELECT list you can have either fields that you group by or aggregations
Accepting above - your first example fails as expected and second one works as expected.
So, nothing to fix really!
Is there a way to create a variable within the select function for calculation purposes but not have it show up in the final results?
You can use subquery for this
For example assuming your original query is:
SELECT
ch,
COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession
FROM YourTable
GROUP BY ch
With subquery it can look as something like below:
SELECT
ch,
COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
COUNT(request) AS Requests,
ROUND(COUNT(request)/COUNT(DISTINCT(AC_SessionID),15000000),1) AS RequestsPerSession
FROM (
SELECT
ch,
AC_SessionID,
IF(Action CONTAINS "request_data",1,NULL) AS request
FROM YourTable
)
GROUP BY ch
which can be further "transformed" to
SELECT
ch,
UniqueSessions,
Requests,
ROUND(Requests/UniqueSessions, 1) AS RequestsPerSession
FROM (
SELECT
ch,
COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
COUNT(request) AS Requests,
FROM (
SELECT
ch,
AC_SessionID,
IF(Action CONTAINS "request_data",1,NULL) AS request
FROM YourTable
)
GROUP BY ch
)
At to which extend to go with such "optimization" is upto personal preferences, I think
As for the subquery option, doesn't it increase the volume of data that we run on as it is double the data (almost double) - we have a lot of data so costs does start to an issue
Within the same query you can have multiple subqueries against the same table(s) and you will be charged only as you do it ones! So far, this is how billing works. So you shouldn't worry about it. BigQuery is smart enough to optimize actual data usage, so performance also shouldn't be of much concern I think
Upvotes: 1