Parduz
Parduz

Reputation: 602

how to use regex to find repeated phrases?

I'm editing an epub made by some kids as school assignement. In this file there's often multiple copy/paste errors. So i've exported the whole thing to a xhtml file and, using SublimeText (if this matters), i'd need to find if the last 4 or 5 words before the </p> tag are already present in the same line (or even better, after the related <p> tag).

As example, this is what i find very often:

<p>This is a whole paragraph that shouldn't contain any repetition. that shouldn't contain any repetition.</p>

There's some examples here and in the web about finding repetitions, but they always looks forward, while i need to find the repetition backward (or at least it seems so to me).

Upvotes: 1

Views: 263

Answers (1)

Federico Piazza
Federico Piazza

Reputation: 31035

I'll assume that the tags <p> aren't there since in that case the statement doesn't finish with a repetition.

So, if the text is just:

This is a whole paragraph that shouldn't contain any repetition. that shouldn't contain any repetition.

Then you could use somethine like this:

(.+)\1

Regex demo

Update: as revo pointed in his comment you can leverage positive lookahead to match the pattern

(.+)\1(?=<\/p>)

Upvotes: 3

Related Questions