Reputation: 13656
I have one Spark DataFrame df1 of around 1000 columns all of String type columns. Now I want to convert df1's columns' type from string to other types like double, int etc based on conditions of column names. For e.g. let's assume df1 has only three columns of string type
df1.printSchema
col1_term1: String
col2_term2: String
col3_term3: String
Condition to change column type is if col name contains term1 then change it to int and if col name contains term2 then change it to double and so on. I am new to Spark.
Upvotes: 4
Views: 7209
Reputation: 11607
While it wouldn't produce any different results than the solution proposed by @Psidom, you can also use a bit of Scala
's syntactic-sugar like this
val modifiedDf: DataFrame = originalDf.columns.foldLeft[DataFrame](originalDf) { (tmpDf: DataFrame, colName: String) =>
if (colName.contains("term1")) tmpDf.withColumn(colName, tmpDf(colName).cast(IntegerType))
else if (colName.contains("term2")) tmpDf.withColumn(colName, tmpDf(colName).cast(DoubleType))
else tmpDf
}
Upvotes: 1
Reputation: 214927
You can simply map over columns, and cast the column to proper data type based on the column names:
import org.apache.spark.sql.types._
val df = Seq(("1", "2", "3"), ("2", "3", "4")).toDF("col1_term1", "col2_term2", "col3_term3")
val cols = df.columns.map(x => {
if (x.contains("term1")) col(x).cast(IntegerType)
else if (x.contains("term2")) col(x).cast(DoubleType)
else col(x)
})
df.select(cols: _*).printSchema
root
|-- col1_term1: integer (nullable = true)
|-- col2_term2: double (nullable = true)
|-- col3_term3: string (nullable = true)
Upvotes: 8