Remove whitespace between html attributes

Question

I'm trying to remove the white space/ blank characters inside HTML attributes:

Initial

Text With Spaces

Desired

Text with spaces

First I tried to removed all the whitespaces by using preg_replace then doing a find and replace to insert back the however that removed all spaces from the document

$cont = preg_replace( '/\s+/', '', $original);
$cont2 = str_replace('



Incorrect


 Textwithspaces

Quixrick · Accepted Answer

It makes it a little tricky in that you want to remove any spaces, except the one immediately after the tag name, but it certainly can be done.

$string = '
 Text With Spaces
 ';

$string = preg_replace_callback('~<([A-Z0-9]+) \K(.*?)>~i', function($m) {$replacement = preg_replace('~\s*~', '', $m[0]); return $replacement;}, $string);

print $string;

What I'm doing here is looking for an opening HTML bracket <, followed by a tag name, followed by a space. For the tag name, I'm using a character class [] that allows letters A-Z and numbers 0-9. It must contain at least one of those characters +, but may contain more than that ... all the way up until it hits a space. Then I'm telling the REGEX to start remembering our matches from this point forward \K and look for any character . as many times as it can find it *, all the way up until it hits the next thing in our REGEX ?. In this case, the next thing in our REGEX is a closing HTML bracket >.

Then I'm using preg_replace_callback to evaluate each match that it finds. For each match, it looks for any number of spaces \s* and replaces them with nothing.

This will give you what you are asking for:

Text With Spaces

However, it will also have some other side effects. Take, for example, a string like this:

$string = 'Google Website
 Text With Spaces
 ';

Using the above regular expression will give you results like this:

Google Website Text With Spaces

To get around that, you'd have to make a couple of matches & replaces, because you'd need to look for quotes/ticks around items, however, those items would need to be inside HTML tags. You can't do something like this:

<.*?"(.*?)".*?>

because it wouldn't match things like "http://www.google.com" style="color:orange;" where you have multiple sets of quotes inside the HTML tag. To work around this, I first pulled out the tags and then once I had those, pulled out the content between each set of quotes for evaluation.

$string = preg_replace_callback('~<(.*?)>~i', 

    function($m) {

        $return_var = preg_replace_callback('~\'|".*?\'|"~', function ($r) {$v = preg_replace('~\s*~', '', $r[0]); return $v;}, $m[0]);
        return $return_var;
    }, 

    $string);

print $string;

This is basically two preg_replace_callbacks shoved into one statement. The first tells the function to operate on any tags. The second one tells the functions to operate on anything inside of quotes/ticks within those HTML tags.

Here is a demo so you can compare the different options

Remove whitespace between html attributes

Answers (1)

Related Questions