Reputation: 9530
Basically I found a slug function which looks like this:
function slug(string) => {
return string.toString().toLowerCase()
.replace(/\s+/g, '-')
.replace(/[^\w\-]+/g, '')
.replace(/\-\-+/g, '-')
.replace(/^-+/, '')
.replace(/-+$/, '');
};
However, it doesn't seem to work for Russian, Greek, ... characters. Basically they are removed at this step .replace(/[^\w\-]+/g, '')
which I don't want but I also want to remove other special characters which do not represent normal letters in some countries.
Example:
English
| Do you know it rains?
| do-you-know-it-rains
Czech
| víš, že prší?
| vis-ze-prsi
Romanian
| Ști că plouă?
| sti-ca-ploua
Russian
| ты знаешь, что идет дождь?
| ты-знаешь-что-идет-дождь
Note:
Basically for latin alphabet I will keep the letters but remove the diacritics, but for non-latin alphabet I will keep the letters as they are (I don't want to convert them into latin characters)
Upvotes: 4
Views: 4372
Reputation: 13973
Here is an pproach that works for special character. Using a set of objects, you categorize every special character you want to replace under the latin character that will replace it.
However, to leave greek and russian untouched, you have to use a regex that considers greek and russian as word characters, so after replacing the special characters using the above trick, you have to remove all non-word characters using the following regex [^-a-zа-я\u0370-\u03ff\u1f00-\u1fff]
.
This regex includes the dash, the latin characters a-z
followed by cyrillic а-я
and finally the \u0370-\u03ff\u1f00-\u1fff
which is the extended unicode range for greek characters.
You can use this wikipedia language recognition chart to add more special characters to the set.
function slugify(text) {
text = text.toString().toLowerCase().trim();
const sets = [
{to: 'a', from: '[ÀÁÂÃÄÅÆĀĂĄẠẢẤẦẨẪẬẮẰẲẴẶἀ]'},
{to: 'c', from: '[ÇĆĈČ]'},
{to: 'd', from: '[ÐĎĐÞ]'},
{to: 'e', from: '[ÈÉÊËĒĔĖĘĚẸẺẼẾỀỂỄỆ]'},
{to: 'g', from: '[ĜĞĢǴ]'},
{to: 'h', from: '[ĤḦ]'},
{to: 'i', from: '[ÌÍÎÏĨĪĮİỈỊ]'},
{to: 'j', from: '[Ĵ]'},
{to: 'ij', from: '[IJ]'},
{to: 'k', from: '[Ķ]'},
{to: 'l', from: '[ĹĻĽŁ]'},
{to: 'm', from: '[Ḿ]'},
{to: 'n', from: '[ÑŃŅŇ]'},
{to: 'o', from: '[ÒÓÔÕÖØŌŎŐỌỎỐỒỔỖỘỚỜỞỠỢǪǬƠ]'},
{to: 'oe', from: '[Œ]'},
{to: 'p', from: '[ṕ]'},
{to: 'r', from: '[ŔŖŘ]'},
{to: 's', from: '[ߌŜŞŠȘ]'},
{to: 't', from: '[ŢŤ]'},
{to: 'u', from: '[ÙÚÛÜŨŪŬŮŰŲỤỦỨỪỬỮỰƯ]'},
{to: 'w', from: '[ẂŴẀẄ]'},
{to: 'x', from: '[ẍ]'},
{to: 'y', from: '[ÝŶŸỲỴỶỸ]'},
{to: 'z', from: '[ŹŻŽ]'},
{to: '-', from: '[·/_,:;\']'}
];
sets.forEach(set => {
text = text.replace(new RegExp(set.from,'gi'), set.to)
});
return text
.replace(/\s+/g, '-') // Replace spaces with -
.replace(/[^-a-zа-я\u0370-\u03ff\u1f00-\u1fff]+/g, '') // Remove all non-word chars
.replace(/--+/g, '-') // Replace multiple - with single -
.replace(/^-+/, '') // Trim - from start of text
.replace(/-+$/, '') // Trim - from end of text
}
console.log(slugify('Do you know it rains?'));
console.log(slugify('víš, že prší?'));
console.log(slugify('Ști că plouă?'));
console.log(slugify('ты знаешь, что идет дождь?'));
console.log(slugify('ἀεὶ Λιβύη φέρει τι καινόν'));
Upvotes: 9