user2334188
user2334188

Reputation: 13

how to remove part of a string (not just replace it) via regex ruby

I have a string from a api response that I use a after_filter to adjust the body of the message. I am replacing a few characters and removing a whole node.

response.body = response.body.gsub(/\<(test.*)\>/, '<\1 "random_stuff">').gsub(
      /\<(wantToDelete.*)\>/, "").gsub(/\<\/(wantToDelete.*)\>/, "")

This somewhat works. Except as expected the part that I want to delete is now a blank string when a really just want it gone.

Before:

<random_stuff>
  <wantToDelete>
    <startDate>2013-11-15</startDate>
  </wantToDelete>
</random_stuff>

After:

<testrandom_stuff>

  <startDate>2013-11-15</startDate>

</testrandom_stuff>

What I want:

 <testrandom_stuff>
  <startDate>2013-11-15</startDate>
 </testrandom_stuff>

Any way to remove the node and gaps?

Upvotes: 1

Views: 346

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

While it's possible to manipulate XML using gsub and regular expressions, it's not recommended because it results in a very fragile solution. XML and HTML are very difficult to process due to the beginning and ending tags, and their freeform nature being prone to changing.

Instead use a parser, like Nokogiri:

require 'nokogiri'

str = <<EOT
<random_stuff>
  <wantToDelete>
    <startDate>2013-11-15</startDate>
  </wantToDelete>
</random_stuff>
EOT

doc = Nokogiri::XML::DocumentFragment.parse(str)

At this point the XML is converted to a DOM in memory and can be searched and changed quite easily.

This is how to find the <random_stuff> and <startDate> tags:

random_stuff = doc.at('random_stuff')
start_date = doc.at('startDate')

Once we have those, we can tell Nokogiri to replace the child nodes of <random_stuff> with the <startDate> node, and rename the <random_stuff> node to <testrandom_stuff>:

random_stuff.children = start_date
random_stuff.name = 'test' + random_stuff.name

Notice that Nokogiri automatically adjusted the name of the closing tag too.

And here's what it looks like:

puts doc.to_xml
# >> <testrandom_stuff>
# >>   <startDate>2013-11-15</startDate>
# >> </testrandom_stuff>

Upvotes: 0

hirolau
hirolau

Reputation: 13911

The line break is usually just the character \n Maybe you could try to match it in the regexp aswell?

response.body.gsub(/\<(test.*)\>/, '<\1 "random_stuff">').gsub(
  /\<(wantToDelete.*)\>\n/, "").gsub(/\<\/(wantToDelete.*)\>\n/, "")

Upvotes: 1

Related Questions