Reputation: 1190
I need to do certain operation on a set of BQ tables but I want to do the operation if and only if I know for certain that all the BQ tables exist.
I have checked the google big query package and it has a sample to read the data from BQ tables - fine But what if my tables are really huge? I can't load all the tables for existence check as it would take too much time and seems redundant.
Is there another way to achieve this? I would be very glad if I could get some pointers in the right direction.
Thank you in advance.
Gaurav
Upvotes: 1
Views: 1341
Reputation: 8044
spark.read.option(...).load
does will not load all the objects into a dataframe.
spark.read.option(...)
returns a DataFrameReader
. when you call load
on it , it will test the connection and issue a query like
SELECT * FROM (select * from objects) SPARK_GEN_SUBQ_11 WHERE 1=0
The query will not scan any records and will error out when the table does not exist. I am not sure about the BigQuery driver but jdbc drivers throw a java exception here, which you need to handle in a try {} catch {}
block.
Thus you can just call load, catch exceptions and check wether all dataframes could be instantiated. Here is some example code
def query(q: String) = {
val reader = spark.read.format("bigquery").option("query", q)
try {
Some(reader.load())
} catch {
case e: Exception => None
}
}
val dfOpts = Seq(
query("select * from foo"),
query("select * from bar"),
query("select * from baz")
)
if(dfOpts.exists(_.isEmpty)){
println("Some table is missing");
}
Upvotes: 2
Reputation: 617
You could use the method tables.get
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get
Otherwise, you can run BG CLI command in a bash script, which can be called from your spark program.
Upvotes: 1