javadev
javadev

Reputation: 287

How to execute spark sql multiline query when stored as a string variable?

I have code like so with a multiline query

  val hiveInsertIntoTable = spark.read.text(fileQuery).collect()
  hiveInsertIntoTable.foreach(println)

  val actualQuery = hiveInsertIntoTable(0).mkString
  println(actualQuery)


  spark.sql(s"truncate table $tableTruncate")
  spark.sql(actualQuery)

Whenever I try to execute actual query I get an error.

org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input '<EOF>'(line 1, pos 52)
== SQL ==
insert into wera_tacotv_esd.lac_asset_table_pb_hive

----------------------------------------------------^^^

and the end of the query  .... ;    (terminates in a ;)

The query is actually about 450 lines

I tried to wrap the variable in triple quotes but that didn't work either.

Any help is appreciated.

I am using spark 2.1 and scala 2.11

Upvotes: 1

Views: 1518

Answers (1)

user9831128
user9831128

Reputation:

Three problems:

  • hiveInsertIntoTable is an Array[org.apache.spark.sql.Row] - not very useful structure.
  • You take only the first row of it hiveInsertIntoTable(0)
  • Even if you took all rows, concatenating with empty string (.mkString) wouldn't work well.

Either:

val actualQuery = spark.read.text(path).as[String].collect.mkString("\n")

or

val actualQuery = spark.sparkContext.wholeTextFiles(path).values.first()

Upvotes: 1

Related Questions