Reputation: 602
I'm editing an epub made by some kids as school assignement. In this file there's often multiple copy/paste errors. So i've exported the whole thing to a xhtml file and, using SublimeText (if this matters), i'd need to find if the last 4 or 5 words before the </p>
tag are already present in the same line (or even better, after the related <p>
tag).
As example, this is what i find very often:
<p>This is a whole paragraph that shouldn't contain any repetition. that shouldn't contain any repetition.</p>
There's some examples here and in the web about finding repetitions, but they always looks forward, while i need to find the repetition backward (or at least it seems so to me).
Upvotes: 1
Views: 263
Reputation: 31035
I'll assume that the tags <p>
aren't there since in that case the statement doesn't finish with a repetition.
So, if the text is just:
This is a whole paragraph that shouldn't contain any repetition. that shouldn't contain any repetition.
Then you could use somethine like this:
(.+)\1
Update: as revo pointed in his comment you can leverage positive lookahead to match the pattern
(.+)\1(?=<\/p>)
Upvotes: 3