Reputation: 2504
I have this string
<h1 class='' id='' title=''></h1>
I want to remove all whitespace except the whitespace after h1 so that the output looks like that
<h1 class=''id=''title=''></h1>
I tried to do it in two times
var string = "<h1 class='' id='' title=''></h1>";
var regexp = /\s/g;
var regexp2 = /(<h1)/g;
var string = string.replace(regexp, "").replace(regexp2, "$1 ");
console.log(string);
And I wanna know if there were a way to unite those two regexp in one. I tried to do
var string = "<h1 class='' id='' title=''></h1>";
var regexp = /(?!<h1)\s/g;
var string = string.replace(regexp, "");
console.log(string);
Unfortunately, it didn't work that way. I want an explained answer on how to remove all whitespaces in my string except the whitespace in <h1
knowing that this is one of the numerous h1
lines and I also want to remove all \n
and \t
, thus why I'm using the \s
in my regexp.
For the why question, I just wanted to remove inside <...> all whitespace after each selectors, but not the whitespace after the tag name like h1
just for my own curiosity and to practice regexp. And also all return and tab chars. In a HTML5 document.
Upvotes: 3
Views: 725
Reputation: 626929
The point here is that you want to match a block of text and only remove something globally only inside one of the subparts. With regex, you usually achieve that by matching a whole block while capturing different smaller subparts with capturing groups (paired (...)
) and, inside the replace
method, use a callback anonymous method that accepts all these (and actually, more) groups.
Here is a regex matching the 'block's:
/(<h1\s+)([^<]*?>)/g
See the regex demo.
Now, you can see there are 2 groups:
(<h1\s+)
- <h1
+ one or more whitespaces([^<]*?>)
- zero or more chars other than <
, as few as possible, up to the first >
character.You need to only remove the spaces between values and attribute names in the second capture, so, use
/(\w+='')\s+/g
and replace with $1
(backreference to the value captured with \w+=''
) inside the callback. The callback accepts the first argument as the whole match, then all the captured subtexts (you can also add an offset and input arguments, see Specifying a function as a parameter
).
var s = `<h1 class='' id='' title=''>Title1</h1>
<h1 class='' id='' title=''>Title2</h1>`;
var res_es6 = s.replace(/(<h1\s+)([^<]*?>)/g, (m,grp1,grp2)=>grp1+grp2.replace(/(\w+='')\s+/g, '$1'));
var res_es5 = s.replace(/(<h1\s+)([^<]*?>)/g,
function(m,grp1,grp2) {
return grp1+grp2.replace(/(\w+='')\s+/g, '$1');
}
);
console.log(res_es6);
console.log(res_es5);
Note there are two result variables: for ES6 and ES5 syntaxes. The ES6 uses arrow function instead of the anonymous method declared with function
. Some older browsers might not like the arrow functions, and IE and Safari currently do not support them.
Upvotes: 1
Reputation: 43169
Substitute (\w+='')\
with $1
, see a demo on regex101.com.
The full JS
code:
var string = "<h1 class='' id='' title=''></h1>";
var regexp = /(\w+='')\ /g;
var string = string.replace(regexp, "$1");
alert(string);
See a demo on jsfiddle.net.
As pointed out by @Redu, you could change it to (\w+='')\s+
for multiple, consecutive whitespaces.
Upvotes: 3