Reputation: 2017
I want to match an html code until next appearance of ... or end.
Currently I have the following regex:
(<font color=\"#777777\">\.\.\. .+?<\/font>)
Which will match just that:
1. <font color="#777777">... </font><font color="#000000">lives up to the customer's expectations. The subscriber is </font>
2. <font color="#777777">... You may not want them to be </font>
3. <font color="#777777">... </font><font color="#000000">the web link, and </font>
But I would want:
1. <font color="#777777">... </font><font color="#000000">lives up to the customer's expectations. The subscriber is </font><font color="#777777">obviously thinking about your merchandise </font><font color="#000000">in case they have clicked about the link in your email.</font>
2. <font color="#777777">... You may not want them to be </font><font color="#000000">disappointed by simply clicking </font>
3. <font color="#777777">... </font><font color="#000000">the web link, and </font><font color="#777777">finding </font><font color="#000000">the page to </font><font color="#777777">get other than </font><font color="#000000">what they thought it </font><font color="#777777">will be.. If America makes</font>
Here is the html that I want to parse:
<font color="#777777">... </font><font color="#000000">lives up to the customer's expectations. The subscriber is </font><font color="#777777">obviously thinking about your merchandise </font><font color="#000000">in case they have clicked about the link in your email.</font><font color="#777777">... You may not want them to be </font><font color="#000000">disappointed by simply clicking </font><font color="#777777">... </font><font color="#000000">the web link, and </font><font color="#777777">finding </font><font color="#000000">the page to </font><font color="#777777">get other than </font><font color="#000000">what they thought it </font><font color="#777777">will be.. If America makes</font>
And demonstration: http://rubular.com/r/mmQ4TBZb96
How to match all texts starting with ... ... to get the desired matches above?
Thanks for help!
Upvotes: 4
Views: 175
Reputation: 425033
Even though your question seems inconsistent (I don't understand why you would get the final desired match), I think this is what you're after:
((<font color=\"#777777\">\.{3}) .+?(<\/font>(?=\s*\2)|$))
It uses a look-ahead to make the end of the match be just before the next "..." sequence (or end of input.
See this on rubular
Upvotes: 2
Reputation: 16228
The question is about regexp, but you could also do it in the following way (Perl syntax, but I believe this kind of functions exist in other languages too):
split(/(?=<font color=\"#777777\">\.\.\.)/, $your_text)
Upvotes: 0