Reputation: 31
I am trying to add lot more columns to a dataframe using existing columns in a dataframe. However, Scala dataframes are immutable making it difficult to do it iteratively. So, I came up with a for loop which outputs the string (see a sample code below, which stores the entire statement I can use on the spark dataframe).
val train_df = sqlContext.sql("select * from someTable")
/*for loop output is similar to the Str variable as below*/
var Str = ".withColumn(\"newCol1\",$\"col1\").withColumn(\"newCol2\",$\"col2\").withColumn(\"newCol3\",$\"col3\")"
/* Below is what I am trying to do" */
val train_df_new = train_df.Str
So, how can I save the expression/argument in a string and reuse it in scala/spark to add all those new columns at once to a new dataframe?
Upvotes: 0
Views: 729
Reputation: 28332
Use a foldLeft
instead. Here a Map
with the old and new column names are used:
val m = Map(("col1", "newCol1"), ("col2", "newCol2"), ("col3", "newCol3"))
val train_df_new = m.keys.foldLeft(train_df)((df, c) => df.withColumnRenamed(c, m(c)))
Instead of withColumnRenamed
any iterative function on the dataframe can be used here.
Upvotes: 2