Reputation: 906
I want to extract columns of a spark sql query without executing it. With parsePlan
:
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
logicalPlan.collect{
case p: Project => p.projectList.map(_.name)
}.flatten
I was able to extract the list of columns. However it doesn't work in case of Select *
, and throws an exception with the following message : An exception or error caused a run to abort: Invalid call to name on unresolved object, tree: *
.
Upvotes: 1
Views: 1297
Reputation: 14905
Without any form of execution it is not possible for Spark to determine the columns. For example if a table was loaded from a csv file
spark.read.option("header",true).csv("data.csv").createOrReplaceTempView("csvTable")
then the query
select * from csvTable
would not be able to read the column names without reading (at least the first line) of the csv file.
Extracting a bit of code from Spark's explain command the following lines get as close as possible to an answer to the question:
val logicalPlan: LogicalPlan = spark.sessionState.sqlParser.parsePlan("select * from csvTable")
val queryExecution: QueryExecution = spark.sessionState.executePlan(logicalPlan)
val outputAttr: Seq[Attribute] = queryExecution.sparkPlan.output
val colNames: Seq[String] = outputAttr.map(a => a.name)
println(colNames)
If the file data.csv
contains the columns a
and b
the code prints
List(a, b)
Upvotes: 1