How to find multiple substring matches within a string, alter substring enclosures

Question

I am trying to parse a string of HTML with ruby, this string contains multiple

tags, I need to find and encode all < and > brackets in between each of these elements.

Example: 

string_1_pre = "Welcome"

string_2_pre = "Welcome
Goodbye"

def clean_pre_code(html_string)
 matched = html_string.match(/(?<=).*(?=<\/pre>)/)
 cleaned = matched.to_s.gsub(/[<]/, "<").gsub(/[>]/, ">")
 html_string.gsub(/(?<=).*(?=<\/pre>)/, cleaned)
end

clean_pre_code(string_1_pre) #=> "<h1>Welcome</h1>"
clean_pre_code(string_2_pre) #=> "<h1>Welcome</h1></pre><pre><h1>Goodbye</h1>"



This works as long as html_string contains only one 
 element, but not if there are multiple.

I would be open to a solution that utilizes Nokogiri or similar, but couldn't figure how to make it do what I want.

Please let me know if you need any additional context.

Update:
This is possible only with Nokogiri, see accepted answer.

tkhuynh · Accepted Answer

@zstrad44 Yes you can get it done by using Nokogiri. Here is my version of code which I develop from your version and this will give you the result you want for multi pre tags in the string.

def clean_pre_code(html_string)
  doc = Nokogiri::HTML(html_string)
  all_pre = doc.xpath('//pre')
  res = ""
  all_pre.each do |pre|
    pre = pre.to_html
    matched = pre.match(/(?<=).*(?=<\/pre>)/)
    cleaned = matched.to_s.gsub(/[<]/, "<").gsub(/[>]/, ">")
    res += pre.gsub(/(?<=).*(?=<\/pre>)/, cleaned)
  end
  res
end


I would recommend you yo read Nokogiri Cheatsheet to have a better understanding of the methods I used in the code. Happy coding! Hope I could help

How to find multiple substring matches within a string, alter substring enclosures

Answers (1)

Related Questions