Shankar
Shankar

Reputation: 8967

Spark - save RDD to multiple files as output

I have a JavaRDD<Model>, which i need to write it as more than one file with different layout [one or two fields in the RDD will be different between different layout].

When i use saveAsTextFile() its calling the toString() method of Model, it means same layout will be written as output.

Currently what i am doing is iterate the RDD using map transformation method and return the different model with other layout, so i can use saveAsTextFile() action to write as different output file.

Just because of one or two fields are different , i need to iterate the entire RDD again and create new RDD then save it as output file.

For example:

Current RDD with fields:

RoleIndicator, Name, Age, Address, Department

Output File 1:

Name, Age, Address

Output File 2:

RoleIndicator, Name, Age, Department

Is there any optimal solution for this?

Regards, Shankar

Upvotes: 3

Views: 4963

Answers (2)

kdgregory
kdgregory

Reputation: 39606

You want to use foreach, not collect.

You should define your function as an actual named class that extends VoidFunction. Create instance variables for both files, and add a close() method that closes the files. Your call() implementation will write whatever you need.

Remember to call close() on your function object after you're done.

Upvotes: 3

Vijay Innamuri
Vijay Innamuri

Reputation: 4372

It is possible with Pair RDD. Pair RDD can be stored in multiple files in a single iteration by using Hadoop Custom output format.

rdd.saveAsHadoopFile(path, key.class, value.class,CustomTextOutputFormat.class, jobConf);


public class FileGroupingTextOutputFormat extends MultipleTextOutputFormat<Text, Text> { 
  @Override
  protected Text generateActualKey(Text key, Text value) {
    return new Text();
  }

  @Override
  protected Text generateActualValue(Text key, Text value) {
    return value;
  }
// returns a dynamic file name based on each RDD element
  @Override
  protected String generateFileNameForKeyValue(Text key, Text value, String name) {
    return value.getSomeField() + "-" + name;
  }
}

Upvotes: 0

Related Questions