Reputation: 55
I'm trying to filter out Unicode characters that aren't related to language from a string.
Here's an example of what I want:
const filt1 = "This will not be replaced: æ Ç ü"; // This will not be replaced: æ Ç ü
const filt2 = "This will be replaced: » ↕ ◄"; // This will be replaced:
How would I go about doing this? Characters such as accented letters and Chinese characters are what I want to keep. Arrows, blocks, emoji, etc. should be filtered out.
I've found various regex filters online, but none do exactly what I want. This one works the best, but it's bulky and does not include non-accented alphanumeric characters.
((?![a-zA-ZàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ ]).)*
Upvotes: 5
Views: 1093
Reputation: 73231
You could try an unicode regex /[^\p{L}\s]/ugi
console.log('This will be replaced: » ↕ ◄, This will not be replaced: æ Ç ü'.replace(/[^\p{L}\s]/ugi, ''));
Unicode property escapes have been added in ES2018, the browser support is currently limited, node.js supports them from the version 10.
Upvotes: 4