Andrew
Andrew

Reputation: 8693

Select dynamic set of columns from dataframe

I am attempting to write a simple Scala program that will ultimately take an argument that is a comma separated list of columns. The goal is to simply select that dynamic set of columns from a giant dataframe into a new dataframe. If I hard code the list of columns, this works:

df.select((Array("colA","colB")).map(df(_)) : _*  ).show

So now I'm trying to get from a string like "colA,colB" to that. Here's my latest effort:

val cols = "colA,colB"
//split to an array, end up with each element quoted
val colList = cols.split(",").mkString("'", "', '", "'")
df.select((Array(colList)).map(df(_)) : _*  ).show

That gives me the following error:

org.apache.spark.sql.AnalysisException: 
Cannot resolve column name "'colA', 'colB'" among (<actual column list>)

This is, of course, correct. There is no column named that. I've tried a few other different things, but I keep getting this error.
What am I doing wrong?

Upvotes: 0

Views: 597

Answers (1)

Pierre Gourseaud
Pierre Gourseaud

Reputation: 2477

This is enough to select the right columns :

val cols = "colA,colB"
val colList = cols.split(",") // This is already the right Array
df.select(colList).map(df(_)) : _*  ).show

Do not use :

val colList = cols.split(",")
val new_string = colList.mkString("'", "', '", "'") // This is "'colA', 'colB'"
df.select(Array(new_string)).map(df(_)) : _*  ).show // Error 

Upvotes: 2

Related Questions