Boriss
Boriss

Reputation: 554

How to detect and remove duplicate sentences in a string?

I just got help from all of you in another problem, and I was wondering if this next issue of mine can be solved easily as well.

Basically, due to me being stuck with poorly converted pdf to excel file, i have a lot of duplicate sentences in each cell.

For example:

$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";

$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'

How the hell is this possible? The condition is that the bad string is repeated X amount of times. It never changes, it is just like copy and pasted in place many times (due to bad pdf to exel conversion)

Is there ANY solution for this?

Upvotes: 0

Views: 782

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174696

I'd use preg_replace. I assume that the duplicated strings are in continuous form.

$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)\1+$~', '\1', $bad_string);

Output:

B7R, B9R, B12R, B12M 430mm Disc 2005 >

DEMO

If the sentences must end with an > symbol, then you could use this regex.

(.*?>)(?=(?:.*?\1)+$)

DEMO

$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?\1)+$)~', '', $bad_string);

Output:

foo  bar B7R, B9R, B12R, B12M 430mm Disc 2005 >

Upvotes: 2

Related Questions