Jaan J
Jaan J

Reputation: 524

ruby on rails regular expression find and remove tags between tags in html string

I'm working in ruby on rails and need the following:

remove all "br" html tags between "code" html tags in a string of html. The "code" tags might occur more than once.

Now, it's not screen scraping I'm trying to do. I have a blog and would like to allow people to use the code html tags only in the comments. So when formatting the string I normally use simple_format but I'd like it to ignore code html tags.

Thanks in advance.

Upvotes: 0

Views: 2609

Answers (3)

vonconrad
vonconrad

Reputation: 25387

If you absolutely positively have to use regexp, try this one, which catches all <br>, <br/> and <br /> tags:

str.gsub(/<code>.+?<\/code>/) {|s| s.gsub(/<br\s*\/?>/, "")}

Tested with:

str = "Lorem ipsum dolor sit amet<br />, <code>consectetur adipisicing elit<br />, sed do eiusmod tempor incididunt ut labore<br> et dolore magna aliqua</code>. Ut enim ad minim veniam,<br> quis nostrud exercitation ullamco laboris nisi<br/> ut aliquip ex ea commodo consequat. <code>Duis aute irure dolor in reprehenderit<br /> in voluptate velit esse cillum dolore<br/> eu fugiat nulla pariatur.</code> Excepteur sint occaecat cupidatat non proident,<br /> sunt in culpa qui officia deserunt mollit anim id est laborum."
p str.gsub(/<code>.+?<\/code>/) {|s| s.gsub(/<br\s*\/?>/, "")}

If you don't have to use regexp, use an html parser like nokogiri.

Upvotes: 4

Rilindo
Rilindo

Reputation: 1796

I second on Hpricot, but what are trying to do? Attempting to do some sort of web-scraping or are you parsing the HTML from a model?

Upvotes: 0

squeeks
squeeks

Reputation: 1269

Using Hpricot or a HTML parser of your choice would be a far, far better idea.

Upvotes: 0

Related Questions