Reputation: 41
I am trying to parse a string of HTML with ruby, this string contains multiple <pre></pre>
tags, I need to find and encode all <
and >
brackets in between each of these elements.
Example:
string_1_pre = "<pre><h1>Welcome</h1></pre>"
string_2_pre = "<pre><h1>Welcome</h1></pre><pre><h1>Goodbye</h1></pre>"
def clean_pre_code(html_string)
matched = html_string.match(/(?<=<pre>).*(?=<\/pre>)/)
cleaned = matched.to_s.gsub(/[<]/, "<").gsub(/[>]/, ">")
html_string.gsub(/(?<=<pre>).*(?=<\/pre>)/, cleaned)
end
clean_pre_code(string_1_pre) #=> "<pre><h1>Welcome</h1></pre>"
clean_pre_code(string_2_pre) #=> "<pre><h1>Welcome</h1></pre><pre><h1>Goodbye</h1></pre>"
This works as long as html_string
contains only one <pre></pre>
element, but not if there are multiple.
I would be open to a solution that utilizes Nokogiri or similar, but couldn't figure how to make it do what I want.
Please let me know if you need any additional context.
Update: This is possible only with Nokogiri, see accepted answer.
Upvotes: 0
Views: 134
Reputation: 1019
@zstrad44 Yes you can get it done by using Nokogiri. Here is my version of code which I develop from your version and this will give you the result you want for multi pre
tags in the string.
def clean_pre_code(html_string)
doc = Nokogiri::HTML(html_string)
all_pre = doc.xpath('//pre')
res = ""
all_pre.each do |pre|
pre = pre.to_html
matched = pre.match(/(?<=<pre>).*(?=<\/pre>)/)
cleaned = matched.to_s.gsub(/[<]/, "<").gsub(/[>]/, ">")
res += pre.gsub(/(?<=<pre>).*(?=<\/pre>)/, cleaned)
end
res
end
I would recommend you yo read Nokogiri Cheatsheet to have a better understanding of the methods I used in the code. Happy coding! Hope I could help
Upvotes: 1