Reputation: 159
I have another question about Spark and Scala. I want to use that technologie to get data and generate a xml. Therefore, I want to know if it is possible to create node ourself (not automatic creation) and what library can we use ? I search but I found nothing very interesting(Like I'm new in this technologie, I don't know many keywords). I want to know if there is in Spark something like this code (I write that in scala. It works in local but I can't use new File() in Spark).
val docBuilder: DocumentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder()
val document = docBuilder.newDocument()
ar root:Element = document.createElement("<name Balise>")
attr = document.createAttribute("<attr1>")
attr.setValue("<value attr1>")
root.setAttributeNode(<attr>)
attr = document.createAttribute("<attr2>")
attr.setValue("<value attr2>")
root.setAttributeNode(attr)
document.appendChild(root)
document.setXmlStandalone(true)
var transformerFactory:TransformerFactory = TransformerFactory.newInstance()
var transformer:Transformer = transformerFactory.newTransformer()
var domSource:DOMSource = new DOMSource(document)
var streamResult:StreamResult = new StreamResult(new File(destination))
transformer.transform(domSource,streamResult)
I want to know if it's possible to do that with spark.
Thanks for your answer and have a good day.
Upvotes: 1
Views: 1152
Reputation: 1164
Not exactly, but you can do something similar by using Spark XML API pr XStream API on Spark.
First try using Spark XML API which is most useful when reading and writing XML files using Spark. However, At the time of writing this, Spark XML has following limitations.
1) Adding attribute to root element has not supported.
2) Does not support following structure where you have header and footer elements.
<parent>
<header></header>
<dataset>
<data attr="1"> suports xml tags and data here</data>
<data attr="2">value2</data>
</dataset>
<footer></footer>
</parent>
If you have one root element and following data then Spark XML is go to api.
Alternatively, you can look at XStream API. Below are steps how to use it to create custom XML structures.
1) First, create a Scala class similar to the structure you wanted in XML.
case class XMLData(name:String, value:String, attr:String)
2) Create an instance of this class
val data = XMLData("bookName","AnyValue", "AttributeValue")
3) Conver data object to XML using XStream API. If you already have data in a DataFrame, then do a map transformation to convert data to an XML string and store it back in DataFrame. if you do so, then you can skip step #4
val xstream = new XStream(new DomDriver)
val xmlString = xstream.toXML(data)
4) Now convert xmlString to DataFrame
val df = xmlString.toDF()
5) Finally, write to a file
df.write.text("file://filename")
Here isa full sample example with XStream API
import com.thoughtworks.xstream.XStream
import com.thoughtworks.xstream.io.xml.DomDriver
import org.apache.spark.sql.SparkSession
case class Animal(cri:String,taille:Int)
object SparkXMLUsingXStream{
def main(args: Array[String]): Unit = {
val spark = SparkSession.
builder.master ("local[*]")
.appName ("sparkbyexamples.com")
.getOrCreate ()
var animal:Animal = Animal("Rugissement",150)
val xstream1 = new XStream(new DomDriver())
xstream1.alias("testAni",classOf[Animal])
xstream1.aliasField("cricri",classOf[Animal],"cri")
val xmlString = Seq(xstream1.toXML(animal))
import spark.implicits._
val newDf = xmlString.toDF()
newDf.show(false)
}
}
Hope this helps !!
Thanks
Upvotes: 1