Mark
Mark

Reputation: 647

Need to keep newlines in pre tags while removing newlines from all other text

I have user-submitted text via form that could have multiple blocks of text in pre tags. I need to remove all newlines, without removing the newlines in the PRE tags, and preserve any other user formatting.

Upvotes: 2

Views: 260

Answers (2)

anubhava
anubhava

Reputation: 785088

You will need to use conditional subpatterns here. Assuming <pre> and </pre> tags are balanced, following code should work for you:

$str = "abc \nfoo\n <pre>123\ndef\nabc\n</pre>qwer\nttt\n bbb";
$p = '~((<pre>)(?(2).*?</pre>)(?:[^\n]*?))*?\n~smi';
$s = preg_replace($p, "$1", $str);
var_dump($s);

OUTPUT

string(42) "abc foo <pre>123
def
abc
</pre>qwerttt bbb"

As you can see \n outside <pre> and </pre> have been removed.

This code searches for 0 or more occurrences of text between <pre> tag in the string and if found then grabs text until </pre> tag. Single iteration of search stops when first \n is found and and then it replaces the matched text with first captured group (i.e. text that came before \n).

Upvotes: 1

Sam Dufel
Sam Dufel

Reputation: 17598

$input = // whatever

$tokenized_input = explode('<pre>', $input); 
for($i = 0; $i < count($tokenized_input); ++$i) {
  $substrings = split('</pre>', $tokenized_input[$i]);
  if (!empty($substrings)) {
    $substrings[count($substrings) - 1] = str_replace("\n", '', $substrings[count($substrings) - 1]);
  }
  $tokenized_input[$i] = implode('</pre>', $substrings);
}

$output = implode('<pre>', $tokenized_input);

Note that I didn't test this. It also assumes that:
- Your <pre> tags are all lowercase, with no attributes
- You're trying to remove only newline characters, not \r\n

Upvotes: 0

Related Questions