Reputation: 632
I'm trying to remove the white space/ blank characters inside HTML attributes:
Initial
<p style="font-size: 1.5em; color: navy;">
Text With Spaces
<p style="font-size: 1.2em; color: navy;">
Desired
<p style="font-size:1.5em;color:navy;">
Text with spaces
<p style="font-size:1.2em;color:navy;">
First I tried to removed all the whitespaces by using preg_replace then doing a find and replace to insert back the however that removed all spaces from the document
$cont = preg_replace( '/\s+/', '', $original);
$cont2 = str_replace('<p', "<p ", $cont2);
Incorrect
<p style="font-size:1.5em;color:navy;">
Textwithspaces
<p style="font-size:1.5em;color:navy;">
Upvotes: 0
Views: 3174
Reputation: 3200
It makes it a little tricky in that you want to remove any spaces, except the one immediately after the tag name, but it certainly can be done.
$string = '<p style="font-size: 1.5em; color: navy;">
Text With Spaces
<p style="font-size: 1.2em; color: navy;"> ';
$string = preg_replace_callback('~<([A-Z0-9]+) \K(.*?)>~i', function($m) {$replacement = preg_replace('~\s*~', '', $m[0]); return $replacement;}, $string);
print $string;
What I'm doing here is looking for an opening HTML bracket <
, followed by a tag name, followed by a space. For the tag name, I'm using a character class []
that allows letters A-Z
and numbers 0-9
. It must contain at least one of those characters +
, but may contain more than that ... all the way up until it hits a space. Then I'm telling the REGEX to start remembering our matches from this point forward \K
and look for any character .
as many times as it can find it *
, all the way up until it hits the next thing in our REGEX ?
. In this case, the next thing in our REGEX is a closing HTML bracket >
.
Then I'm using preg_replace_callback
to evaluate each match that it finds. For each match, it looks for any number of spaces \s*
and replaces them with nothing.
This will give you what you are asking for:
<p style="font-size:1.5em;color:navy;">
Text With Spaces
<p style="font-size:1.2em;color:navy;">
However, it will also have some other side effects. Take, for example, a string like this:
$string = '<p style="font-size: 1.5em; color: navy;"><a href="http://www.google.com" style="color: orange;">Google Website</a>
Text With Spaces
<p style="font-size: 1.2em; color: navy;"> <img src="http://www.google.com/images/logo.gif" width=100 height=100>';
Using the above regular expression will give you results like this:
<p style="font-size:1.5em;color:navy;"><a href="http://www.google.com"style="color:orange;">Google Website</a>
Text With Spaces
<p style="font-size:1.2em;color:navy;"> <img src="http://www.google.com/images/logo.gif"width=100height=100>
To get around that, you'd have to make a couple of matches & replaces, because you'd need to look for quotes/ticks around items, however, those items would need to be inside HTML tags. You can't do something like this:
<.*?"(.*?)".*?>
because it wouldn't match things like "http://www.google.com" style="color:orange;"
where you have multiple sets of quotes inside the HTML tag. To work around this, I first pulled out the tags and then once I had those, pulled out the content between each set of quotes for evaluation.
$string = preg_replace_callback('~<(.*?)>~i',
function($m) {
$return_var = preg_replace_callback('~\'|".*?\'|"~', function ($r) {$v = preg_replace('~\s*~', '', $r[0]); return $v;}, $m[0]);
return $return_var;
},
$string);
print $string;
This is basically two preg_replace_callbacks
shoved into one statement. The first tells the function to operate on any tags. The second one tells the functions to operate on anything inside of quotes/ticks within those HTML tags.
Here is a demo so you can compare the different options
Upvotes: 1