wwaawaw
wwaawaw

Reputation: 7127

Why does this regex/DOM character entity tester return `undefined`?

var str = 'let us pretend that this is a blog about gardening&cooking; here's an apostrophe & ampersand just for fun.';

This is the string I'm operating on. The desired end result is: "let us pretend that this is a blog about gardening&cooking; here's an apostrophe & ampersand just for fun."

console.log('Before: ' + str);


str = str.replace(/&(?:#x?)?[0-9a-z]+;?/gi, function(m){
  var d = document.createElement('div');
  console.log(m);
  d.innerHTML = m.replace(/&/, '&');
  console.log(d.innerHTML + '|' + d.textContent);
  return !!d.textContent.match(m.replace(/&/, '&')[0]) ? m : d.textContent;
});


console.log('After: ' + str);

Upvotes: 1

Views: 148

Answers (2)

Bergi
Bergi

Reputation: 664548

This should do what you want:

str.replace(/&([#x]\d+;|[a-z]+;)/g, "&$1")

or, with a positive lookahead:

str.replace(/&(?=[#x]\d+;|[a-z]+;)/g, "&")

I don't think you need any HTML2text en-/decoding.

Upvotes: 0

Warlock
Warlock

Reputation: 7471

The problem is that HTML doesn't support XML's ' To avoid the issue you should use ' instead of '

For more information look at this post:

Why shouldn't ' be used to escape single quotes?

Upvotes: 1

Related Questions