Reputation: 63
I have a pyspark function but would need to convert that to Scala
PySpark
for i in [c for c in r.columns if c.startswith("_")]:
r = r.withColumn(i, F.col(i)["id"])
As scala data type is unmutable hence any better way from Scala for me to create multiple new columns ,without val df1 = df.withcolumn, val df2 = df1.withcolumn like what I did in pyspark ?
Table r as below
+-----------+-------------+-------------+-------------+-------------+
| _0| _1| _2| _3| _4|
+-----------+-------------+-------------+-------------+-------------+
|[1, Carter]| [5, Banks]|[11, Derrick]| [4, Hood]| [12, Jef]|
|[1, Carter]| [12, Jef]| [4, Hood]| [5, Banks]|[11, Derrick]|
|[1, Carter]| [4, Hood]| [12, Jef]|[11, Derrick]| [5, Banks]|
|[1, Carter]| [12, Jef]| [5, Banks]|[11, Derrick]| [4, Hood]|
|[1, Carter]| [4, Hood]| [12, Jef]| [5, Banks]|[11, Derrick]|
|[1, Carter]|[11, Derrick]| [12, Jef]| [4, Hood]| [5, Banks]|
|[1, Carter]| [12, Jef]|[11, Derrick]| [5, Banks]| [4, Hood]|
|[1, Carter]| [5, Banks]| [4, Hood]|[11, Derrick]| [12, Jef]|
|[1, Carter]|[11, Derrick]| [5, Banks]| [4, Hood]| [12, Jef]|
|[1, Carter]| [5, Banks]|[11, Derrick]| [12, Jef]| [4, Hood]|
|[1, Carter]| [5, Banks]| [12, Jef]|[11, Derrick]| [4, Hood]|
|[1, Carter]| [5, Banks]| [12, Jef]| [4, Hood]|[11, Derrick]|
|[1, Carter]|[11, Derrick]| [5, Banks]| [12, Jef]| [4, Hood]|
|[1, Carter]| [4, Hood]|[11, Derrick]| [5, Banks]| [12, Jef]|
|[1, Carter]|[11, Derrick]| [4, Hood]| [5, Banks]| [12, Jef]|
|[1, Carter]| [12, Jef]| [5, Banks]| [4, Hood]|[11, Derrick]|
|[1, Carter]| [12, Jef]|[11, Derrick]| [4, Hood]| [5, Banks]|
|[1, Carter]| [4, Hood]|[11, Derrick]| [12, Jef]| [5, Banks]|
|[1, Carter]|[11, Derrick]| [4, Hood]| [12, Jef]| [5, Banks]|
|[1, Carter]| [12, Jef]| [4, Hood]|[11, Derrick]| [5, Banks]|
+-----------+-------------+-------------+-------------+-------------+
Upvotes: 0
Views: 884
Reputation: 726
You can use foldLeft
import org.apache.spark.sql.functions.{col}
val updDf = df
.columns
.filter(_.startsWith("_"))
.foldLeft(df)((df, c) => df.withColumn(s"new_$c", col(c).getItem("id")))
Upvotes: 1
Reputation: 1322
You can do it with a single select (each .withColumn
creates a new Dataset to resolve)
// either replace with the internal id column, or take as is
val updates = r.columns.map(c => if (c.startsWith("_")) col(s"$c.id") as c else col(c))
val newDf = r.select(updates:_*) // _* expands the Sequence into a parameter list
Upvotes: 0