Rahul Sharma
Rahul Sharma

Reputation: 5834

How to cast all columns of Spark dataset to string using Java

I have a dataset with so many columns and I want to cast all columns to the string using Java.

I tried below steps, I want to know if there is any better way to achieve this?

Dataset<Row> ds = ...;
JavaRDD<String[]> stringArrRDD = ds.javaRDD().map(row->{
          int length = row.length();
          String[] columns = new String[length];
          for(int i=0; i<length;i++){
              columns[i] = row.get(i) !=null? row.get(i).toString():"";
          }
       return  columns;});

Upvotes: 1

Views: 8338

Answers (2)

roizaig
roizaig

Reputation: 145

If you want to use objects only:

import org.apache.spark.sql.types.*;
...
for (String c: ds.columns()) {
    ds = ds.withColumn(c, ds.col(c).cast(DataTypes.StringType));
}

Upvotes: 2

Alper t. Turker
Alper t. Turker

Reputation: 35229

You can iterate over columns:

for (String c: ds.columns()) {
    ds = ds.withColumn(c, ds.col(c).cast("string"));
}

Upvotes: 7

Related Questions