IgorAlves
IgorAlves

Reputation: 5550

Regex PHP to find and replace white space and,or new line between HTML tags

I will have a string (one line) composed by a HTML code that will be stored in a PHP variable. This string comes from a HTML page that normally has new line and white spaces between tags. We can have new line (one or more) and, or white space like this exemle:

<h1>tag1</h> 
       <p>Between h ad p we have \s and \n</p>

After perform a regex and preg_replace I would like to have this:

<h1>tag1</h><p>Between h ad p we have \s and \n</p>

I have tried this regex but it is not workig.

$str=<<<EOF
<h1>tag1</h> 
           <p>Between h ad p we have \s and \n</p>

EOF;


$string =  trim(preg_replace('/(>\s+<)|(>\n+<)/', ' ', $str)); 

Here you can find the entire code http://www.phpliveregex.com/p/7Pn

Upvotes: 2

Views: 3098

Answers (3)

nu11p01n73R
nu11p01n73R

Reputation: 26667

There are two problems with

(preg_replace('/(>\s+<)|(>\n+<)/', ' ', $str)
  • \s already includes \n hence there is no need to provide another alternation.

  • (>\s+<)here the regex consumes both the angulars < and > hence replacing with space would remove everything including the angulars

The output is

<h1>tag1</hp>Between h ad p we have \s and \n</p>

which is not what you want

How to correct

use the regex (>\s+<) and replacement string as >< giving output as

<h1>tag1</h><p>Between h ad p we have \s and \n</p>

for example http://regex101.com/r/dI1cP2/2

you can also use lookaround to solve the issue

the regex would be

(?<=>)\s+(?=<)

and replace string would be empty string

Explanation

(?<=>) asserts that \s is presceded by >

\s+ matches one or more space

(?=<) asserts the \s is followed by <

Here the look arounds will not consume any angular brackets as in the earlier regex

see http://regex101.com/r/dI1cP2/3 for example

Upvotes: 5

vks
vks

Reputation: 67968

(?<=<\/h>)\s+

Try this.See demo.Replace by empty string

http://regex101.com/r/jI8lV7/1

Upvotes: 0

jogesh_pi
jogesh_pi

Reputation: 9782

You can try with this:

echo preg_replace("/(?=\>\s+\n|\n)+(\s+)/", "", $str);

Upvotes: 1

Related Questions