user6325753
user6325753

Reputation: 577

Parse XML data in Apache Spark

I need to know how to how to parse XML file in Spark. I am receiving streaming data from kafka and then need to parse that streamed data.

Here is my Spark code to receive data:

directKafkaStream.foreachRDD(rdd ->{
            rdd.foreach(s ->{
                System.out.println("&&&&&&&&&&&&&&&&&" +s._2 );
            });

And results:

<root>
<student>
<name>john</name>
<marks>90</marks>
</student>
</root>

How to pass these XML elements?

Upvotes: 2

Views: 2080

Answers (2)

user6325753
user6325753

Reputation: 577

Thanks guys.. Problem Solved. Here is the solution.

String xml = "<name>xyz</name>";
DOMParser parser = new DOMParser();
try {
    parser.parse(new InputSource(new java.io.StringReader(xml)));
    Document doc = parser.getDocument();
    String message = doc.getDocumentElement().getTextContent();
    System.out.println(message);
} catch (Exception e) {
    // handle SAXException 
}

Upvotes: 3

Amit Kulkarni
Amit Kulkarni

Reputation: 704

As you are processing streaming data, it would be helpful to use databricks's spark-xml lib for xml data processing.

Reference: https://github.com/databricks/spark-xml

Upvotes: 2

Related Questions