Reputation: 554
I just got help from all of you in another problem, and I was wondering if this next issue of mine can be solved easily as well.
Basically, due to me being stuck with poorly converted pdf to excel file, i have a lot of duplicate sentences in each cell.
For example:
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
$good_string = goodFunction($bad_String);
//echo 'B7R, B9R, B12R, B12M 430mm Disc 2005 >'
How the hell is this possible? The condition is that the bad string is repeated X amount of times. It never changes, it is just like copy and pasted in place many times (due to bad pdf to exel conversion)
Is there ANY solution for this?
Upvotes: 0
Views: 782
Reputation: 174696
I'd use preg_replace
. I assume that the duplicated strings are in continuous form.
$bad_string = "B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~^(.*?)\1+$~', '\1', $bad_string);
Output:
B7R, B9R, B12R, B12M 430mm Disc 2005 >
If the sentences must end with an >
symbol, then you could use this regex.
(.*?>)(?=(?:.*?\1)+$)
$bad_string = "foo B7R, B9R, B12R, B12M 430mm Disc 2005 > bar B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >B7R, B9R, B12R, B12M 430mm Disc 2005 >";
echo preg_replace('~(.*?>)(?=(?:.*?\1)+$)~', '', $bad_string);
Output:
foo bar B7R, B9R, B12R, B12M 430mm Disc 2005 >
Upvotes: 2