Reputation: 59
Is there any way in Apache Spark to save a java RDD of text as an XML file?
What I do currently is save the RDD as a plain text file using saveAsTextFile method and then convert it to XML. I am interested to find a way to directly create the XML file from RDD.
Any tip, idea or guide will be appreciated.
Upvotes: 0
Views: 521
Reputation: 2281
You can refer databricks xml library to read and write data from/to xml. Inferring schema from data:
import org.apache.spark.sql.SQLContext
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
.format("com.databricks.spark.xml")
.option("rowTag", "book")
.load("books.xml");
df.select("author", "_id").write()
.format("com.databricks.spark.xml")
.option("rootTag", "books")
.option("rowTag", "book")
.save("newbooks.xml");
Upvotes: 1