bastibe
bastibe

Reputation: 17209

How to replace every occurrence of a pattern in a string using Ruby?

I have an XML file which is too big. To make it smaller, I want to replace all tags and attribute names with shorter versions of the same thing.

So, I implemented this:

string.gsub!(/<(\w+) /) do |match|
    case match
    when 'Image' then 'Img'
    when 'Text'  then 'Txt'
    end
end

puts string

which deletes all opening tags but does not do much else.

What am I doing wrong here?

Upvotes: 1

Views: 486

Answers (3)

the Tin Man
the Tin Man

Reputation: 160571

Here's the beauty of using a parser such as Nokogiri:

This lets you manipulate selected tags (nodes) and their attributes:

require 'nokogiri'

xml = <<EOT
<xml>
  <Image ImagePath="path/to/image">image comment</Image>
  <Text TextFont="courier" TextSize="9">this is the text</Text>
</xml>
EOT

doc = Nokogiri::XML(xml)
doc.search('Image').each do |n| 
  n.name = 'img' 
  n.attributes['ImagePath'].name = 'path'
end
doc.search('Text').each do |n| 
  n.name = 'txt'
  n.attributes['TextFont'].name = 'font'
  n.attributes['TextSize'].name = 'size'
end
print doc.to_xml
# >> <?xml version="1.0"?>
# >> <xml>
# >>   <img path="path/to/image">image comment</img>
# >>   <txt font="courier" size="9">this is the text</txt>
# >> </xml>

If you need to iterate through every node, maybe to do a universal transformation on the tag-name, you can use doc.search('*').each. That would be slower than searching for individual tags, but might result in less code if you need to change every tag.

The nice thing about using a parser is it'll work even if the layout of the XML changes since it doesn't care about whitespace, and will work even if attribute order changes, making your code more robust.

Upvotes: 2

glenn mcdonald
glenn mcdonald

Reputation: 15488

Here's another way:

class String
  def minimize_tags!
    {"image" => "img", "text" => "txt"}.each do |from,to|
      gsub!(/<#{from}\b/i,"<#{to}")
      gsub!(/<\/#{from}>/i,"<\/#{to}>")
    end
    self
  end
end

This will probably be a little easier to maintain, since the replacement patterns are all in one place. And on strings of any significant size, it may be a lot faster than Kevin's way. I did a quick speed test of these two methods using the HTML source of this stackoverflow page itself as the test string, and my way was about 6x faster...

Upvotes: 2

Kevin
Kevin

Reputation: 1855

Try this:

string.gsub!(/(<\/?)(\w+)/) do |match|
  tag_mark = $1
  case $2
  when /^image$/i
    "#{tag_mark}Img"
  when /^text$/i
    "#{tag_mark}Txt"
  else
    match
  end
end  

Upvotes: 1

Related Questions