Reputation:
I have a JavaScript string that I'm writing to a file. I need to replace any unmatched surrogate pairs with the replacement character. Is there some regex character class that only matches unpaired surrogates or do I have to do some additional processing?
Upvotes: 2
Views: 620
Reputation:
function toWellFormed(s) {
return s.replace(/\p{Surrogate}/gu, '\uFFFD')
}
toWellFormed('foo 𝌆') // 'foo 𝌆'
toWellFormed('foo \uD834\uDF06') // 'foo 𝌆'
toWellFormed('foo \uD834') // 'foo �'
toWellFormed('foo \uDF06\uDF06\uDF06') // 'foo ���'
Upvotes: 1
Reputation: 14749
String.prototype.toWellFormed()
replaces any lone surrogates with the Unicode replacement character U+FFFD �
.
Upvotes: 3