Soheila S.
Soheila S.

Reputation: 59

Save javaRDD as XML file

Is there any way in Apache Spark to save a java RDD of text as an XML file?

What I do currently is save the RDD as a plain text file using saveAsTextFile method and then convert it to XML. I am interested to find a way to directly create the XML file from RDD.

Any tip, idea or guide will be appreciated.

Upvotes: 0

Views: 521

Answers (1)

FaigB
FaigB

Reputation: 2281

You can refer databricks xml library to read and write data from/to xml. Inferring schema from data:

import org.apache.spark.sql.SQLContext

SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
    .format("com.databricks.spark.xml")
    .option("rowTag", "book")
    .load("books.xml");

df.select("author", "_id").write()
    .format("com.databricks.spark.xml")
    .option("rootTag", "books")
    .option("rowTag", "book")
    .save("newbooks.xml");

Upvotes: 1

Related Questions