Timofey Pasichnik
Timofey Pasichnik

Reputation: 23

Dynamically parsing of XMLs in Groovy

I have XML with this format:

<message>
  <message_type_id>1</message_type_id>
  <message_type_code>code1</message_type_code>
  <version/>
  <created_at>date1</created_at>
  <payload>
    <payment>
      <document_id>id1</document_id>
      <account_id>id2</account_id>
    </payment>
  </payload>
</message>

Branch inside payload is not defined. In one XML it can have one structure, in other XML - another.

As a result I want a dynamic array like this:

message_type_id: 1
message_type_code: code1
created_at: date1
document_id: id1
account_id: id2

Remember, that keys "document_id" and "account_id" can have another structure with different levels of embedding. In other words, I need to parse only leaves of each XML tree. And I don't know how these leaves are called, so constructions like

root.payload.payment.document_id

aren't useful.

I tried to solve this task with XmlSlurper, but didn't successed. How can I solve this task?

Upvotes: 0

Views: 422

Answers (1)

tim_yates
tim_yates

Reputation: 171054

So given the Xml (here in a String)

def xmlText = '''<message>
  <message_type_id>1</message_type_id>
  <message_type_code>code1</message_type_code>
  <version/>
  <created_at>date1</created_at>
  <payload>
    <payment>
      <document_id>id1</document_id>
      <account_id>id2</account_id>
    </payment>
  </payload>
</message>'''

We can parse it with XmlParser:

def message = new XmlParser().parseText(xmlText)

And then find all the nodes that are not empty, and only contain text (these are the leaf nodes). We can then use collectEntries to make a map from them:

def map = message.'**'.findAll {
         !it.children().empty &&                        // Ignore empty leaves
          it.children().every { it instanceof String }  // Leaves only contain strings
}.collectEntries {
    [it.name(), it.text()]
}

This assigns the following to the variable map:

[
    message_type_id: "1",
    message_type_code: "code1",
    created_at: "date1",
    document_id: "id1",
    account_id: "id2"
]

You will have issues if nodes have the same name however (as maps can only contain a single key)

Upvotes: 1

Related Questions