Reputation: 1345
I am programmatically trying to convert datatypes of columns and running into some coding issues.
I modified the code used here for this.
Data >> any numbers being read as strings.
Code >>
import org.apache.spark.sql
raw_data.schema.fields
.collect({case x if x.dataType.typeName == "string" => x.name})
.foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})
Error >>
<console>:75: error: type mismatch;
found : org.apache.spark.sql.Column
required: org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
.foldLeft(raw_data)({case(dframe,field) => dframe(field).cast(sql.types.IntegerType)})
Upvotes: 1
Views: 84
Reputation: 28322
The problem is that the result of dframe(field).cast(sql.types.IntegerType)
in the foldLeft
is a column, however, to continue the iteration a dataframe is expected. In the link where the code is originally from dframe.drop(field)
is used which does return a dataframe and hence works.
To fix this, simply use withColumn
which will adjust a specific column and then return the whole dataframe:
foldLeft(raw_data)({case(dframe, field) => dframe.withColumn(field, dframe(field).cast(sql.types.IntegerType))})
Upvotes: 2