Dan.G
Dan.G

Reputation: 21

BigQuery Issue with Select when using AS

We love BigQuery it is amazing and we use it a lot but we have an issue with the Select statement, for some reason if you create a Select based on Variables you defined it does not work, but when you use the full code of the variables it does

The error I get is the following -

Error: (L5:47): Expression 'RequestsPerSession' is not present in the GROUP BY list

off course we do not want to aggregate based on this variable - there for it is no in the GROUP BY section.

This does not work

SELECT
  ch,
  COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(Requests/UniqueSessions,1) AS RequestsPerSession

This does work

SELECT
  ch,
  COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession

Any Ideas on how to fix it?

Also is there a way to create a variable within the select function for calculation purposes but not have it show up in the final results?

Upvotes: 1

Views: 125

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Any Ideas on how to fix it?

try below "workaround"

SELECT
  ch,
  COUNT(Distinct(AC_SessionID),15000000) As UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(Requests/UniqueSessions,1) * MAX(1) AS RequestsPerSession

I realized that looks like engine needs some hint to understand that this particular field (RequestsPerSession) is not for grouping by but rather aggregation.

Below is how I "tested"

SELECT id, 
  COUNT(a) AS b,
  SUM(a) AS c,
  COUNT(a)/SUM(a) AS k1,
  b/SUM(a) AS k2,
  COUNT(a)/c AS k3,
  MAX(1) * (b/c) AS k4
FROM
(SELECT 1 AS id, 1 AS a),
(SELECT 1 AS id, 2 AS a),
(SELECT 1 AS id, 3 AS a),
(SELECT 1 AS id, 4 AS a),
(SELECT 2 AS id, 1 AS a),
(SELECT 2 AS id, 2 AS a),
(SELECT 2 AS id, 3 AS a),
GROUP BY id

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Any Ideas on how to fix it?

The GROUP BY clause allows you to group rows that have the same values for a given field or set of fields so that you can compute aggregations of related fields. Thus, in SELECT list you can have either fields that you group by or aggregations
Accepting above - your first example fails as expected and second one works as expected.

So, nothing to fix really!

Is there a way to create a variable within the select function for calculation purposes but not have it show up in the final results?

You can use subquery for this
For example assuming your original query is:

SELECT
  ch,
  COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
  COUNT(IF(Action CONTAINS "request_data",1,NULL)) AS Requests,
  ROUND(COUNT(IF(Action CONTAINS "request_data",1,NULL))/UniqueSessions,1) AS RequestsPerSession
FROM YourTable
GROUP BY ch

With subquery it can look as something like below:

SELECT
  ch,
  COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
  COUNT(request) AS Requests,
  ROUND(COUNT(request)/COUNT(DISTINCT(AC_SessionID),15000000),1) AS RequestsPerSession
FROM (
  SELECT 
    ch, 
    AC_SessionID, 
    IF(Action CONTAINS "request_data",1,NULL) AS request
  FROM YourTable
)
GROUP BY ch

which can be further "transformed" to

SELECT 
  ch,
  UniqueSessions,
  Requests,
  ROUND(Requests/UniqueSessions, 1) AS RequestsPerSession
FROM (
  SELECT
    ch,
    COUNT(DISTINCT(AC_SessionID),15000000) AS UniqueSessions,
    COUNT(request) AS Requests,
  FROM (
    SELECT 
      ch, 
      AC_SessionID, 
      IF(Action CONTAINS "request_data",1,NULL) AS request
    FROM YourTable
  )
  GROUP BY ch
)

At to which extend to go with such "optimization" is upto personal preferences, I think

As for the subquery option, doesn't it increase the volume of data that we run on as it is double the data (almost double) - we have a lot of data so costs does start to an issue

Within the same query you can have multiple subqueries against the same table(s) and you will be charged only as you do it ones! So far, this is how billing works. So you shouldn't worry about it. BigQuery is smart enough to optimize actual data usage, so performance also shouldn't be of much concern I think

Upvotes: 1

Related Questions