Reputation: 3280
I have a body of text stored as a string. There are multiple substrings that i want to replace with a substring of that substring. This is a typical substring that i want to replace (note there are multiple substrings i want to replace).
$String = "loads of text [[gibberish text|Text i Want]] more text [[gibberish text|Text i Want]] more text [[if no separator then just remove tags]]";
$String = deleteStringBetweenStrings("[[", "|", $String , true);
deleteStringBetweenStrings is a recursive function that will delete all code between the 2 substrings (including the substrings) this will do what i want for the first substring but goes a bit crazy after this.
function deleteStringBetweenStrings($beginning, $end, $string, $recursive)
{
$beginningPos = strpos($string, $beginning);
$endPos = strpos($string, $end);
if ($beginningPos === false || $endPos === false)
{
return $string;
}
$textToDelete = substr($string, $beginningPos, ($endPos + strlen($end)) - $beginningPos);
$string = str_replace($textToDelete, '', $string);
if (strpos($string, $beginning) && strpos($string, $end) && $recursive == true)
{
$string = deleteStringBetweenStrings($beginning, $end, $string, $recursive);
}
return $string;
}
Is there a more efficient way for me to do this?
Expected output = "loads of text Text i Want more text Text i Want more text if no separator then just remove tags"
Upvotes: 1
Views: 272
Reputation: 174706
Regex and regex only....
Just use the below regex to match the text which you don't want and then replace it with an empty string.
(?<=\[\[)(?:(?!\]]|\|).)*\||\[\[|\]\]
Code:
<?php
$str = "loads of text [[gibberish text|Text i Want]] more text [[gibberish text|Text i Want]] more text [[if no separator then just remove tags]]";
echo preg_replace("/(?<=\[\[)(?:(?!\]]|\|).)*\||\[\[|\]\]/m", "", $str);
?>
Output:
loads of text Text i Want more text Text i Want more text if no separator then just remove tags
How i figure it out?
(?<=\[\[)
Looks after to [[
symbols.(?:(?!\]]|\|).)*
Match any character but not of ]]
or |
symbol zero or more times.\|
A literal |
symbol. This ensures that the match must contain a |
symbol before it reaches the closing parenthesis ]]
gibberish text|
only in this [[gibberish text|Text i Want]]
type of string and it won't touch this [[if no separator then just remove tags]]
|
OR\[\[
Now match [[
|
OR\]\]
Symbols. Removing all the matched characters will give you the desired output.Upvotes: 1
Reputation: 3097
Something like this should do the trick (whilst preserving the ability to add your own start and end strings):
function deleteStringBetweenStrings($start, $end, $string) {
// create a pattern from the input and make it safe to use in a regular expression
$pattern = '|' . preg_quote($start) . '(.*)' . preg_quote($end) . '|U';
// replace every occurrence of this pattern with an empty string in full $string
return preg_replace($pattern, '', $string);
}
$String = "loads of text [[gibberish text|Text i Want]] more text [[gibberish text|Text i Want]] more text [[if no separator then just remove tags]]";
$String = deleteStringBetweenStrings("[[", "|", $String);
Upvotes: 1
Reputation: 4025
Try this one:
$string = 'loads of text [[gibberish text|Text i Want]] more text [[gibberish text|Text i Want]] more text [[if no separator then just remove tags]]';
function doReplace($matches) {
$str = $matches[2];
if (strpos($str, '|')) {
$parts = explode('|', $str);
return $parts[1];
} else {
return $str;
}
}
echo preg_replace_callback('/(\[\[(.*?)\]\])/', 'doReplace', $string);
It echoes
loads of text Text i Want more text Text i Want more text if no separator then just remove tags
Which I think it's exactly what you want!
Upvotes: 0