sunilgaikwad
sunilgaikwad

Reputation: 19

How to read BigQuery table from java spark with BigQuery connector

I am trying to read bigquery table through spark java code as below:

    BigQuerySQLContext bqSqlCtx = new BigQuerySQLContext(sqlContext);
    bqSqlCtx.setGcpJsonKeyFile("sxxxl-gcp-1x4c0xxxxxxx.json");
    bqSqlCtx.setBigQueryProjectId("winged-standard-2xxxx");
    bqSqlCtx.setBigQueryDatasetLocation("asia-east1");
    bqSqlCtx.setBigQueryGcsBucket("dataproc-9cxxxxx39-exxdc-4e73-xx07- 2258xxxx4-asia-east1");
    Dataset<Row> testds = bqSqlCtx.bigQuerySelect("select * from bqtestdata.customer_visits limit 100");

But I'm facing the below issue:

19/01/14 10:52:01 WARN org.apache.spark.sql.SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.
19/01/14 10:52:01 INFO com.samelamin.spark.bigquery.BigQueryClient: Executing query select * from bqtestdata.customer_visits limit 100
19/01/14 10:52:02 INFO com.samelamin.spark.bigquery.BigQueryClient: Creating staging dataset winged-standard-2xxxxx:spark_bigquery_staging_asia-east1

Exception in thread "main" java.util.concurrent.ExecutionException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 

400 Bad Request
{
  "code" : 400,
  "errors" : 
[ {
    "domain" : "global",
    **"message" : "Invalid dataset ID \"spark_bigquery_staging_asia-east1\". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.",**
    "reason" : "invalid"
  } ],
  "message" : "Invalid dataset ID \"spark_bigquery_staging_asia-east1\". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.",
  "status" : "INVALID_ARGUMENT"
}

Upvotes: 1

Views: 940

Answers (2)

AlphaCR
AlphaCR

Reputation: 847

I had a similar problem with samelamin's Scala library. Apparently this is due to the library not able to handle location other than US and EU, therefore the library will not be able to access datasets from asia-east1.

For now, I'm using the BigQuery Spark Connector to load and write my data from BigQuery.

If you were able to get a workaround to use this library, please share it as well.

Upvotes: 0

TT--
TT--

Reputation: 3205

The message in the response

Dataset IDs must be alphanumeric (plus underscores)...

indicates that the dataset ID "spark_bigquery_staging_asia-east1" is invalid since it has a hyphen in it, specifically in asia-east1.

Upvotes: 1

Related Questions