Askar
Askar

Reputation: 5854

How do I force parsing an XML node as hash array?

This is my simplified myXML:

<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Message>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
    </Line>
    <Line>
      <LineNumber>4.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Peach</Description>
    </Line>
  </Message>
</ShipmentRequest>

When I parse it with the Crack gem myHash is:

{
   "MemberId"=>"A00000001", 
   "MemberName"=>"Bruce", 
   "Line"=>[
       {"LineNumber"=>"3.1", "Item"=>"A0001", "Description"=>"Apple"}, 
       {"LineNumber"=>"4.1", "Item"=>"A0002", "Description"=>"Peach"}
    ]
}

The Crack gem creates the hash Line as an array, because there two <Line> nodes in myXML. But if myXML would contain only one <Line> node, the Crack gem would not parse it as an array:

{
    "MemberId"=>"ABC0001", 
    "MemberName"=>"Alan", 
    "Line"=> {"LineNumber"=>"4.1", "Item"=>"fruit-004", "Description"=>"Apple"}
}

I want to see it still as an array no matter if there's only one node:

{
    "MemberId"=>"ABC0001", 
    "MemberName"=>"Alan", 
    "Line"=> [{"LineNumber"=>"4.1", "Item"=>"fruit-004", "Description"=>"Apple"}]
}

Upvotes: 0

Views: 915

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

The problem is, you're relying on code to do what you really should do. Crack has no idea that you want a single node to be an array of a single element, and that behavior makes it a lot more difficult for you when trying to dive into that portion of the data.

Parsing XML isn't hard, and, by parsing it yourself, you'll know what to expect, and will avoid the hassle of dealing with the "sometimes it's an array and sometimes it's not" returned by Crack.

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Message>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
    </Line>
    <Line>
      <LineNumber>4.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Peach</Description>
    </Line>
  </Message>
</ShipmentRequest>
EOT

That sets up the DOM, so it can be navigated:

hash = {}
message = doc.at('Message')
hash[:member_id] = message.at('MemberId').text
hash[:member_name] = message.at('MemberName').text
lines = message.search('Line').map do |line|
  line_number = line.at('LineNumber').text 
  item = line.at('Item').text 
  description = line.at('Description').text

  {
    :line_number => line_number,
    :item        => item,
    :description => description
  }
end
hash[:lines] = lines
  1. message = doc.at('Message') finds the first <Message> node.
  2. message.at('MemberId').text finds the first <MemberID> node inside <Message>.
  3. message.at('MemberName').text is similar to the above step.
  4. message.search('Line') looks for all <Line> nodes inside <Message>.

From those descriptions you can figure out the rest.

After running, hash looks like:

{:member_id=>"A00000001",
:member_name=>"Bruce",
:lines=>
  [{:line_number=>"3.1", :item=>"fruit-004", :description=>"Peach"},
  {:line_number=>"4.1", :item=>"fruit-001", :description=>"Peach"}]}

If I remove one of the <Line> blocks from the XML, and re-run, I get:

{:member_id=>"A00000001",
:member_name=>"Bruce",
:lines=>[{:line_number=>"3.1", :item=>"fruit-004", :description=>"Peach"}]}

Using search to locate the <Line> nodes is the trick. search returns a NodeSet, which is akin to an Array, so by iterating over it using map it'll return an array of hashes of the contents of <Line> tags.

Nokogiri is a great tool for parsing HTML and XML, then allowing us to search, add, change or remove nodes. It supports CSS and XPath accessors, so if you are used to jQuery or how CSS works, or XPath expressions, you'll be off and running quickly. The tutorials for Nokogiri are a good starting place to learn how it works.

Upvotes: 1

fbonetti
fbonetti

Reputation: 6672

After you convert the XML document to a hash you could do this:

myHash["Line"] = [myHash["Line"]] if myHash["Line"].kind_of?(Hash)

It will ensure that the Line node will be wrapped in Array.

Upvotes: 4

Related Questions