Reputation: 181
I want to change all column schema of a spark Dataset in scala; Sudo code is like this:
val mydataset =...
for (col_t <- mydataset.columns) {
if (col_t.name.startsWith("AA")) col_t.nullable=true;
if (col_t.name.startsWith("BB")) col_t.name+="CC";
}
And it is supposed to update column name and nullable property of each (or all) depending on a criteria.
Upvotes: 0
Views: 421
Reputation: 553
You have to use the df.schema to achieve this for sure.
The pseudo code is as follows.
import org.apache.spark.sql.types.{ StructField, StructType }
import org.apache.spark.sql.{ DataFrame, SQLContext }
val newSchema = StructType(df.schema.map {
case StructField(c, t, _, m) if c.equals(cn) && cn.startsWith("AA") => StructField(c, t, nullable = true, m)
case StructField(c, t, _, m) if c.equals(cn) && cn.startsWith("BB") => StructField(c + "CC", t, nullable = nullable, m)
case y: StructField => y
})
val newDf = df.sqlContext.createDataFrame(df.rdd, newSchema)
Hope, this helps.
Upvotes: 0
Reputation: 10082
You can use df.schema
to get the current schema of the dataframe, map over it, apply your conditions and apply it back on top of your original dataframe.
import org.apache.spark.sql.types._
val newSchema = df.schema.map{ case StructField(name, datatype, nullable, metadata) =>
if (name.startsWith("AA") ) StructField(name, datatype, true, metadata)
if (name.startsWith("BB") ) StructField(name+"CC" , datatype, true, metadata)
// more conditions here
}
This will return a List[StructField]
To apply it on your original Dataframe(df
):
val newDf = spark.createDataFrame(df.rdd, StructType(newSchema) )
Upvotes: 1