all jazz
all jazz

Reputation: 2017

matching everything until the next match

I want to match an html code until next appearance of ... or end.

Currently I have the following regex:

(<font color=\"#777777\">\.\.\. .+?<\/font>)

Which will match just that:

1. <font color="#777777">... </font><font color="#000000">lives up to the customer's expectations. The subscriber is </font>
2. <font color="#777777">... You may not want them to be </font>
3. <font color="#777777">... </font><font color="#000000">the web link, and </font>

But I would want:

1. <font color="#777777">... </font><font color="#000000">lives up to the customer's expectations. The subscriber is </font><font color="#777777">obviously thinking about your merchandise </font><font color="#000000">in case they have clicked about the link in your email.</font>
2. <font color="#777777">... You may not want them to be </font><font color="#000000">disappointed by simply clicking </font>
3. <font color="#777777">... </font><font color="#000000">the web link, and </font><font color="#777777">finding </font><font color="#000000">the page to </font><font color="#777777">get other than </font><font color="#000000">what they thought it </font><font color="#777777">will be.. If America makes</font>

Here is the html that I want to parse:

<font color="#777777">... </font><font color="#000000">lives up to the customer's expectations. The subscriber is </font><font color="#777777">obviously thinking about your merchandise  </font><font color="#000000">in case they have clicked about the link in your email.</font><font color="#777777">... You may not want them to be </font><font color="#000000">disappointed by simply clicking </font><font color="#777777">... </font><font color="#000000">the web link, and </font><font color="#777777">finding  </font><font color="#000000">the page to </font><font color="#777777">get other than  </font><font color="#000000">what they thought it </font><font color="#777777">will be.. If America makes</font>

And demonstration: http://rubular.com/r/mmQ4TBZb96

How to match all texts starting with ... ... to get the desired matches above?

Thanks for help!

Upvotes: 4

Views: 175

Answers (2)

Bohemian
Bohemian

Reputation: 425033

Even though your question seems inconsistent (I don't understand why you would get the final desired match), I think this is what you're after:

((<font color=\"#777777\">\.{3}) .+?(<\/font>(?=\s*\2)|$))

It uses a look-ahead to make the end of the match be just before the next "..." sequence (or end of input.

See this on rubular

Upvotes: 2

Vasiliy
Vasiliy

Reputation: 16228

The question is about regexp, but you could also do it in the following way (Perl syntax, but I believe this kind of functions exist in other languages too):

split(/(?=<font color=\"#777777\">\.\.\.)/, $your_text)

Upvotes: 0

Related Questions