VirInvictus
VirInvictus

Reputation: 11

Transforming XML structures using Ruby

I've been wracking my brain trying to solve this problem. This is my first time using any scripting language for this kind of work, and I guess I might've picked a hard job to start with. Essentially, what I need to do is transform some basic XML into a heavier XML structure.

Example :

Translate the following :

<xml>
  <test this="stuff">13141</test>
  <another xml="tag">do more stuff</another>
<xml>

Into this :

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Package>
<Package version="1.0">
  <tests>
    <test name="stuff">
      <information>13141</information>
    </test>
  </tests>
  <anothers>
    <another name="tag">
      <information>do more stuff</information>
    </another>
  </anothers>
</Package>

I've tried doing it manually via regex, but that is a lot of work to do. I've tried storing, for example, multiple test tags into an array, so I can save them into the tests tag in the second example, but I can't seem to keep track of everything. I've looked into REXML and Hpricot, but can't figure out how to use them to properly do this.

So, basically, what I'm asking is : Does anyone have any ideas on how I might be able to manage this in a more efficient way?

Upvotes: 1

Views: 538

Answers (3)

tig
tig

Reputation: 27830

require 'rubygems'
require 'hpricot'
require 'activesupport'

source = <<-XML
<xml>
<test this="stuff">13141</test>
<another xml="tag">do more stuff</another>
</xml>
XML

def each_source_child(source)
  doc = Hpricot.XML(source)

  doc.at('xml').children.each do |child|
    if child.is_a?(Hpricot::Elem)
      yield child
    end
  end
end

output = Hpricot.build do |doc|
  doc << '<?xml version="1.0" encoding="UTF-8"?>'
  doc << '<!DOCTYPE Package>'
  doc.tag! :Package, :version => '1.0' do |package|
    each_source_child(source) do |child|
      package.tag! child.name.pluralize do |outer|
        outer.tag! child.name, :name => child.attributes.values.first do |inner|
          inner.tag! :information do |information|
            information.text! child.innerText
          end
        end
      end
    end
  end
end

puts output

there will be no whitespaces between tags

Upvotes: 1

Benjamin Oakes
Benjamin Oakes

Reputation: 12782

Hpricot and Builder in combination may provide what you're looking for. The steps would be:

  1. Read in XML with Hpricot
  2. Pick out what elements you want
  3. Spit out your new XML (through Builder) by iterating over elements from Hpricot

Upvotes: 0

Nick Lewis
Nick Lewis

Reputation: 4230

Look into XSLT. I only have a passing familiarity with the technology, but its use is to transform XML documents from one form to another, which sounds like what you need.

Upvotes: 2

Related Questions