Reputation: 43
i'm trying to use this RegEx search: <div class="ms3">(\n.*?)+<
in Ruby, however as soon as i get to the last character "<" it stops working altogether. I've tested it in Rubular and the RegEx works perfectly fine, I'm using rubymine to write my code but i also tested it using Powershell and it comes up with the same results. no Error message. when i run <div class="ms3">(\n.*?)+
it prints <div class="ms3">
which is exactly what i'm looking for, but as soon as i add the "<" it comes out with nothing.
my code:
#!/usr/bin/ruby
# encoding: utf-8
File.open('ms3.txt', 'w') do |fo|
fo.puts File.foreach('input.txt').grep(/<div class="ms3">(\n.*?)+/)
end
some of what i'm searching through:
<div class="ms3">
<span xml:lang="zxx"><span xml:lang="zxx">Still the tone of the remainder of the chapter is bleak. The</span> <span class="See_In_Glossary" xml:lang="zxx">DAY OF THE <span class="Name_Of_God" xml:lang="zxx">LORD</span></span> <span xml:lang="zxx">holds no hope for deliverance (5.16–18); the futility of offering sacrifices unmatched by common justice is once more underlined, and exile seems certain (5.21–27).</span></span>
</div>
<div class="Paragraph">
<span class="Verse_Number" id="idAMO_5_1" xml:lang="zxx">1</span><span class="scrText">Listen, people of Israel, to this funeral song which I sing over you:</span>
</div>
<div class="Stanza_Break"></div>
The full RegEx i need to do is <div class="ms3">(\n.*?)+<\/div>
it picks up the first section and nothing else
Upvotes: 0
Views: 57
Reputation: 37409
Your problem starts with using File.foreach('input.txt')
which cuts the result into lines. This means that the pattern is matched to each line separately, so none of the lines match the pattern (by definition, none of the lines have \n
in its middle).
You should have better luck reading the whole text as a block, and using match
on it:
File.read('input.txt').match(/<div class="ms3">(\n.*?)+<\/div>/)
# => #<MatchData "<div class=\"ms3\">\n <span xml:lang=\"zxx\">
# => <span xml:lang=\"zxx\">Still the tone of the remainder of the chapter is bleak. The</span>
# => <span class=\"See_In_Glossary\" xml:lang=\"zxx\">DAY OF THE
# => <span class=\"Name_Of_God\" xml:lang=\"zxx\">LORD</span></span>
# => <span xml:lang=\"zxx\">holds no hope for deliverance (5.16–18);
# => the futility of offering sacrifices unmatched by common justice is once more
# => underlined, and exile seems certain (5.21–27).</span></span>\n </div>" 1:"\n ">
Upvotes: 1