Leothorn
Leothorn

Reputation: 1345

Cast all specific datatype columns into other datatypes programatically in Scala Spark

I am programmatically trying to convert datatypes of columns and running into some coding issues.

I modified the code used here for this.

Data >> any numbers being read as strings.

Code >>

import org.apache.spark.sql
raw_data.schema.fields
    .collect({case x if x.dataType.typeName == "string" => x.name})
    .foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})

Error >>

<console>:75: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
           .foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})

Upvotes: 1

Views: 84

Answers (1)

Shaido
Shaido

Reputation: 28322

The problem is that the result of dframe(field).cast(sql.types.IntegerType) in the foldLeft is a column, however, to continue the iteration a dataframe is expected. In the link where the code is originally from dframe.drop(field) is used which does return a dataframe and hence works.

To fix this, simply use withColumn which will adjust a specific column and then return the whole dataframe:

foldLeft(raw_data)({case(dframe, field) => dframe.withColumn(field, dframe(field).cast(sql.types.IntegerType))})

Upvotes: 2

Related Questions