lucy
lucy

Reputation: 4506

How to save files in same directory using saveAsNewAPIHadoopFile spark scala

I am using spark streaming and I want to save each batch of spark streaming on my local in Avro format. I have used saveAsNewAPIHadoopFile to save data in Avro format. This works well. But it overwrites the existing file. Next batch data will overwrite the old data. Is there any way to save Avro file in common directory? I tried by adding some properties of Hadoop job conf for adding a prefix in the file name. But not working any properties.

dstream.foreachRDD {
  rdd.saveAsNewAPIHadoopFile(
      path,
      classOf[AvroKey[T]],
      classOf[NullWritable],
      classOf[AvroKeyOutputFormat[T]],
      job.getConfiguration()
    )
}

Upvotes: 2

Views: 800

Answers (1)

Ajay Ahuja
Ajay Ahuja

Reputation: 1313

Try this -

You can make your process split into 2 steps :

Step-01 :- Write Avro file using saveAsNewAPIHadoopFile to <temp-path>
Step-02 :- Move file from <temp-path> to <actual-target-path>

This will definitely solve your problem for now. I will share my thoughts if I get to fulfill this scenario in one step instead of two.

Hope this is helpful.

Upvotes: 1

Related Questions