Reputation: 8693
I am attempting to write a simple Scala program that will ultimately take an argument that is a comma separated list of columns. The goal is to simply select that dynamic set of columns from a giant dataframe into a new dataframe. If I hard code the list of columns, this works:
df.select((Array("colA","colB")).map(df(_)) : _* ).show
So now I'm trying to get from a string like "colA,colB"
to that. Here's my latest effort:
val cols = "colA,colB"
//split to an array, end up with each element quoted
val colList = cols.split(",").mkString("'", "', '", "'")
df.select((Array(colList)).map(df(_)) : _* ).show
That gives me the following error:
org.apache.spark.sql.AnalysisException:
Cannot resolve column name "'colA', 'colB'" among (<actual column list>)
This is, of course, correct. There is no column named that. I've tried a few other different things, but I keep getting this error.
What am I doing wrong?
Upvotes: 0
Views: 597
Reputation: 2477
This is enough to select the right columns :
val cols = "colA,colB"
val colList = cols.split(",") // This is already the right Array
df.select(colList).map(df(_)) : _* ).show
Do not use :
val colList = cols.split(",")
val new_string = colList.mkString("'", "', '", "'") // This is "'colA', 'colB'"
df.select(Array(new_string)).map(df(_)) : _* ).show // Error
Upvotes: 2