Adel
Adel

Reputation: 181

How to traverse column of dataset in spark?

I want to change all column schema of a spark Dataset in scala; Sudo code is like this:

   val mydataset  =...
    for (col_t <- mydataset.columns) {
        if (col_t.name.startsWith("AA")) col_t.nullable=true; 
        if (col_t.name.startsWith("BB")) col_t.name+="CC"; 
    }

And it is supposed to update column name and nullable property of each (or all) depending on a criteria.

Upvotes: 0

Views: 421

Answers (2)

Gourav Dutta
Gourav Dutta

Reputation: 553

You have to use the df.schema to achieve this for sure.

The pseudo code is as follows.

import org.apache.spark.sql.types.{ StructField, StructType }
import org.apache.spark.sql.{ DataFrame, SQLContext }

val newSchema = StructType(df.schema.map {
      case StructField(c, t, _, m) if c.equals(cn) && cn.startsWith("AA") => StructField(c, t, nullable = true, m)
      case StructField(c, t, _, m) if c.equals(cn) && cn.startsWith("BB") => StructField(c + "CC", t, nullable = nullable, m)
      case y: StructField => y
    })
val newDf = df.sqlContext.createDataFrame(df.rdd, newSchema)

Hope, this helps.

Upvotes: 0

philantrovert
philantrovert

Reputation: 10082

You can use df.schema to get the current schema of the dataframe, map over it, apply your conditions and apply it back on top of your original dataframe.

import org.apache.spark.sql.types._

val newSchema = df.schema.map{ case StructField(name, datatype, nullable, metadata) =>
    if (name.startsWith("AA") ) StructField(name, datatype, true, metadata)
    if (name.startsWith("BB") ) StructField(name+"CC" , datatype, true, metadata)
    // more conditions here
}

This will return a List[StructField]

To apply it on your original Dataframe(df):

val newDf = spark.createDataFrame(df.rdd, StructType(newSchema) )

Upvotes: 1

Related Questions