paulalexandru
paulalexandru

Reputation: 9530

Javascript Slug working for non latin characters also

Basically I found a slug function which looks like this:

function slug(string) => { 
    return string.toString().toLowerCase()
        .replace(/\s+/g, '-')
        .replace(/[^\w\-]+/g, '')
        .replace(/\-\-+/g, '-')
        .replace(/^-+/, '')
        .replace(/-+$/, '');
};

However, it doesn't seem to work for Russian, Greek, ... characters. Basically they are removed at this step .replace(/[^\w\-]+/g, '') which I don't want but I also want to remove other special characters which do not represent normal letters in some countries.

Example:

English | Do you know it rains? | do-you-know-it-rains

Czech | víš, že prší? | vis-ze-prsi

Romanian | Ști că plouă? | sti-ca-ploua

Russian | ты знаешь, что идет дождь? | ты-знаешь-что-идет-дождь

Note:

Basically for latin alphabet I will keep the letters but remove the diacritics, but for non-latin alphabet I will keep the letters as they are (I don't want to convert them into latin characters)

Upvotes: 4

Views: 4372

Answers (1)

jo_va
jo_va

Reputation: 13973

Here is an pproach that works for special character. Using a set of objects, you categorize every special character you want to replace under the latin character that will replace it.

However, to leave greek and russian untouched, you have to use a regex that considers greek and russian as word characters, so after replacing the special characters using the above trick, you have to remove all non-word characters using the following regex [^-a-zа-я\u0370-\u03ff\u1f00-\u1fff].

This regex includes the dash, the latin characters a-z followed by cyrillic а-я and finally the \u0370-\u03ff\u1f00-\u1fff which is the extended unicode range for greek characters.

You can use this wikipedia language recognition chart to add more special characters to the set.

function slugify(text) {
  text = text.toString().toLowerCase().trim();

  const sets = [
    {to: 'a', from: '[ÀÁÂÃÄÅÆĀĂĄẠẢẤẦẨẪẬẮẰẲẴẶἀ]'},
    {to: 'c', from: '[ÇĆĈČ]'},
    {to: 'd', from: '[ÐĎĐÞ]'},
    {to: 'e', from: '[ÈÉÊËĒĔĖĘĚẸẺẼẾỀỂỄỆ]'},
    {to: 'g', from: '[ĜĞĢǴ]'},
    {to: 'h', from: '[ĤḦ]'},
    {to: 'i', from: '[ÌÍÎÏĨĪĮİỈỊ]'},
    {to: 'j', from: '[Ĵ]'},
    {to: 'ij', from: '[IJ]'},
    {to: 'k', from: '[Ķ]'},
    {to: 'l', from: '[ĹĻĽŁ]'},
    {to: 'm', from: '[Ḿ]'},
    {to: 'n', from: '[ÑŃŅŇ]'},
    {to: 'o', from: '[ÒÓÔÕÖØŌŎŐỌỎỐỒỔỖỘỚỜỞỠỢǪǬƠ]'},
    {to: 'oe', from: '[Œ]'},
    {to: 'p', from: '[ṕ]'},
    {to: 'r', from: '[ŔŖŘ]'},
    {to: 's', from: '[ߌŜŞŠȘ]'},
    {to: 't', from: '[ŢŤ]'},
    {to: 'u', from: '[ÙÚÛÜŨŪŬŮŰŲỤỦỨỪỬỮỰƯ]'},
    {to: 'w', from: '[ẂŴẀẄ]'},
    {to: 'x', from: '[ẍ]'},
    {to: 'y', from: '[ÝŶŸỲỴỶỸ]'},
    {to: 'z', from: '[ŹŻŽ]'},
    {to: '-', from: '[·/_,:;\']'}
  ];

  sets.forEach(set => {
    text = text.replace(new RegExp(set.from,'gi'), set.to)
  });

  return text
    .replace(/\s+/g, '-')    // Replace spaces with -
    .replace(/[^-a-zа-я\u0370-\u03ff\u1f00-\u1fff]+/g, '') // Remove all non-word chars
    .replace(/--+/g, '-')    // Replace multiple - with single -
    .replace(/^-+/, '')      // Trim - from start of text
    .replace(/-+$/, '')      // Trim - from end of text
}

console.log(slugify('Do you know it rains?'));
console.log(slugify('víš, že prší?'));
console.log(slugify('Ști că plouă?'));
console.log(slugify('ты знаешь, что идет дождь?'));
console.log(slugify('ἀεὶ Λιβύη φέρει τι καινόν'));

Upvotes: 9

Related Questions