Anoop Deshpande
Anoop Deshpande

Reputation: 582

How to access BigQuery using Spark which is running outside of GCP

I'm trying to connect my Spark Job which is running on private datacenter with BigQuery. I have created service account and got private JSON key and gained read access to the dataset I wanted to query for. But, when I try integrating with Spark, I'm receiving User does not have bigquery.tables.create permission for dataset xxx:yyy.. Do we need create table permission to read data from table using BigQuery?

Below is the response gets printed on console,

{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "Access Denied: Dataset xxx:yyy: User does not have bigquery.tables.create permission for dataset xxx:yyy.",
    "reason" : "accessDenied"
  } ],
  "message" : "Access Denied: Dataset xxx:yyy: User does not have bigquery.tables.create permission for dataset xxx:yyy.",
  "status" : "PERMISSION_DENIED"
}

Below is my Spark code which I'm trying to access BigQuery

object ConnectionTester extends App {


  val session = SparkSession.builder()
    .appName("big-query-connector")
    .config(getConf)
    .getOrCreate()


    session.read
      .format("bigquery")
      .option("viewsEnabled", true)
    .load("xxx.yyy.table1")
    .select("col1")
    .show(2)


  private def getConf : SparkConf = {
    val sparkConf = new SparkConf
    sparkConf.setAppName("biq-query-connector")
    sparkConf.setMaster("local[*]")
    sparkConf.set("parentProject", "my-gcp-project")
    sparkConf.set("credentialsFile", "<path to my credentialsFile>")

    sparkConf
  }
}

Upvotes: 1

Views: 1643

Answers (2)

David Rabinowitz
David Rabinowitz

Reputation: 30448

For reading regular tables there's no need for bigquery.tables.create permission. However, the code sample you've provided hints that the table is actually a BigQuery view. BigQuery views are logical references, they are not materialized on the server side and in order for spark to read them they first need to be materialized to a temporary table. In order to create this temporary table bigquery.tables.create permission is required.

Upvotes: 2

s.polam
s.polam

Reputation: 10382

Check below code.

Credential

val credentials = """
     | {
     |           "type": "service_account",
     |           "project_id": "your project id",
     |           "private_key_id": "your private_key_id",
     |           "private_key": "-----BEGIN PRIVATE KEY-----\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n-----END PRIVATE KEY-----\n",
     |           "client_email": "[email protected]",
     |           "client_id": "111111111111111111111111111",
     |           "auth_uri": "https://accounts.google.com/o/oauth2/auth",
     |           "token_uri": "https://oauth2.googleapis.com/token",
     |           "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
     |           "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxxxx40vvvvvv.iam.gserviceaccount.com"
     |         }
     | """

Encode base64 & pass it to spark conf.

def base64(data: String) = {
    import java.nio.charset.StandardCharsets
    import java.util.Base64
    Base64.getEncoder.encodeToString(data.getBytes(StandardCharsets.UTF_8))
}
spark.conf.set("credentials",base64(credentials))
spark
      .read
      .options("parentProject","parentProject")
      .option("table","dataset.table")
      .format("bigquery")
      .load()

Upvotes: 0

Related Questions