ankur agrawal
ankur agrawal

Reputation: 41

Bigquery query API - issue with Array

I am trying to run below query -

        select prd_cat, product_category from
        (
            select split( product_category,".") as prd_cat,product_category  from 
test_dataset.cosme_raw_table  
where product_link = "XXX"
        ) as a
        group by prd_cat,product_category;

when I ran it using BigQuery Web interface it ran successfully but when I tried to run it using BigQuery Query API it failed with error message "Grouping by expressions of type ARRAY is not allowed at [6:10]" Below is my code -

        String query = "select prd_cat, product_category" +
                " from\n" +
                "(\n" +
                "select split( product_category,\".\") as prd_cat," +
                "product_category  " +
                "from test_dataset.cosme_raw_table  \n" +
                "where product_link = \"XXX\"\n" +
                ") as a\n" +
                "group by prd_cat,product_category";

        QueryJobConfiguration queryJobConfiguration =
                QueryJobConfiguration.newBuilder(query)
                        .setDestinationTable(tableId1)
                      .setWriteDisposition(JobInfo.WriteDisposition.WRITE_TRUNCATE)
                        .build();

        Job loadJob1 = bigquery.create(JobInfo.of(queryJobConfiguration));

Below are the logs -

        com.google.cloud.bigquery.BigQueryException: Grouping by expressions of type ARRAY is not allowed at [6:10]
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:99)
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:401)
at com.google.cloud.bigquery.BigQueryImpl$23.call(BigQueryImpl.java:688)
at com.google.cloud.bigquery.BigQueryImpl$23.call(BigQueryImpl.java:683)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:682)
at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:674)
at com.google.cloud.bigquery.Job$1.call(Job.java:329)
at com.google.cloud.bigquery.Job$1.call(Job.java:326)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
at com.google.cloud.RetryHelper.poll(RetryHelper.java:63)
at com.google.cloud.bigquery.Job.waitForQueryResults(Job.java:325)
at com.google.cloud.bigquery.Job.waitFor(Job.java:240)
at TestBigQuery.explicit(TestBigQuery.java:190)
at TestBigQuery.main(TestBigQuery.java:32)
        Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
        {
          "code" : 400,
          "errors" : [ {
"domain" : "global",
"location" : "parameters.q",
"message" : "Grouping by expressions of type ARRAY is not allowed at [6:10]",
"reason" : "invalidQuery"
          } ],
          "message" : "Grouping by expressions of type ARRAY is not allowed at [6:10]",
          "status" : "INVALID_ARGUMENT"
        }

Can some one please help. Thanks!!

Upvotes: 1

Views: 1257

Answers (2)

Elliott Brossard
Elliott Brossard

Reputation: 33705

If you use legacy SQL, the GROUP BY operator will implicitly flatten any arrays that you are grouping. If you use standard SQL, you need to flatten the array explicitly. Note that:

  • BigQuery's classic UI uses legacy SQL by default, but you can change the dialect in the query options.
  • BigQuery's new UI (the one that is part of the Cloud console) uses standard SQL by default.
  • BigQuery's client libraries use standard SQL by default.

You can fix your query to work using standard SQL by flattening the array, e.g.:

select prd_cat, product_category
from test_dataset.cosme_raw_table,
  UNNEST(split( product_category,".")) as prd_cat
where product_link = "XXX"
group by prd_cat,product_category;

It's not clear to me what result you are hoping to get with the query, but at least it should run.

Upvotes: 0

F10
F10

Reputation: 2883

This is happening because you're using Legacy SQL. You need to set it in your QueryJobConfiguration. For instance:

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Dataset;
import com.google.cloud.bigquery.DatasetInfo;
import com.google.cloud.bigquery.FieldValue;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.QueryJobConfiguration;

public class QuickstartSample {
  public static void main(String... args) throws Exception {
    BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
    String query = "Your-Query";
    //setUseLegacySql(true) below
    QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).setUseLegacySql(true).build();
    for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
        for (FieldValue val : row) {
             System.out.printf("%s,", val.toString());
        }
        System.out.printf("\n");
    }
  }
}

Otherwise, you could use the TO_JSON_STRING with Standard SQL. For instance:

String query =  "WITH sample AS (SELECT 1 id, ['a,b', 'c'] a UNION ALL SELECT 1, ['a','b,c']) SELECT TO_JSON_STRING(a) arr,COUNT(DISTINCT id) cnt FROM sample GROUP BY arr";
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();

In your case, you could try:

WITH a AS (select split(product_category,".") as prd_cat,product_category from test_dataset.cosme_raw_table where product_link = "XXX") select TO_JSON_STRING(prd_cat) arr, product_category from a GROUP BY arr,product_category

Hope it helps.

Upvotes: 1

Related Questions