major
major

Reputation: 57

How to generate a list with xml element as key and xml as value in scala

I have a stream of xml records which I process in scala using hadoopRDD and finally save in a single file However I need to sort those XMLs based on certain attributes before saving them in output file.

I thought of creating List with xml value and xml like below

Input

<Transaction>
    <eventid>1234<eventId/>
    <eventName>hello<eventName/>
    .......
<Transaction/>
<Transaction>
    <eventid>2345<eventId/>
    <eventName>hi<eventName/>
    .......
<Transaction/>

--- and so on

My idea is to create a list as {(1234, xml1),(2345,xml2)....} , sort on first element and save the second element to output file.

How can this be done in Scala , or is there a better approach to do this Thanks in advance for your suggestions and help

Upvotes: 0

Views: 166

Answers (1)

major
major

Reputation: 57

I was able to figure it out like below: First, I have created a function to extract eventId from xml, returning both eventId and xml:

val rdd = input.map {x => (geteventId(x) , x)}

Then I sorted on eventId and extracted only xml and saved on hdfs:

val result = rdd.soryBy(x => x._1).map(x => x._2)

geteventId(x) is used by parsing xml to get the value for eventId.

Upvotes: 1

Related Questions