User9102d82
User9102d82

Reputation: 1190

How to check if big query table exists with spark/scala

I need to do certain operation on a set of BQ tables but I want to do the operation if and only if I know for certain that all the BQ tables exist.

I have checked the google big query package and it has a sample to read the data from BQ tables - fine But what if my tables are really huge? I can't load all the tables for existence check as it would take too much time and seems redundant.

Is there another way to achieve this? I would be very glad if I could get some pointers in the right direction.

Thank you in advance.

Gaurav

Upvotes: 1

Views: 1341

Answers (2)

dre-hh
dre-hh

Reputation: 8044

spark.read.option(...).load does will not load all the objects into a dataframe. spark.read.option(...) returns a DataFrameReader. when you call load on it , it will test the connection and issue a query like

SELECT * FROM (select * from objects) SPARK_GEN_SUBQ_11 WHERE 1=0

The query will not scan any records and will error out when the table does not exist. I am not sure about the BigQuery driver but jdbc drivers throw a java exception here, which you need to handle in a try {} catch {} block.

Thus you can just call load, catch exceptions and check wether all dataframes could be instantiated. Here is some example code

def query(q: String) = {
   val reader = spark.read.format("bigquery").option("query", q)
  try {
    Some(reader.load())
  } catch {
    case e: Exception => None
  }
}

val dfOpts =  Seq(
  query("select * from foo"),
  query("select * from bar"),
  query("select * from baz")
)


if(dfOpts.exists(_.isEmpty)){
  println("Some table is missing");
}

Upvotes: 2

i0707
i0707

Reputation: 617

You could use the method tables.get

https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get

Otherwise, you can run BG CLI command in a bash script, which can be called from your spark program.

Upvotes: 1

Related Questions