user5352
user5352

Reputation: 113

How to sanitize all unicode characters in the Url

We have an chat based app, where our customer rep chats with the end user. Recently our security team found an issue with our app, where the user can inject a homograph version of the Url in the chat window.

example: if the end user types the below question in the chat window: How to change my email settings at http://www.abcоs.ca

In the above example, site is homographed version, where in "оs.ca" is non ascii character, whereas the original url could be http://www.abcos.ca (I just made up these url examples)

so I tried to use the below code in my javascript:

var chatMessage = 'How to change my email settings at http://www.abcоs.ca'
chatMessage.normalize('NFD').replace(/[^\u0000-\u007f]/g, '');

the above script works well and it strips off the non ascii character, but then we have support for both English and French chat,

in French if the chatMessage is "Comment modifier mes paramètres de messagerie sur http://www.abcоs.ca

then it replaces "è" with "e" in the sanitized version.

Wondering if there is a way to detect the non ascii character within an url from the input text value in javascript, so that the expect outcome would be: "Comment modifier mes paramètres de messagerie sur http://www.abc" so that french characters are still retained in the text, but the within url's the non ascii characters are sanitized.

Appreciate expert advice and guidance.

Upvotes: 1

Views: 1411

Answers (1)

Keith
Keith

Reputation: 24221

Ok,

There is a support list of confusables you could use here -> https://www.unicode.org/Public/security/10.0.0/confusables.txt

There is also an NPM package with this data in @ https://www.npmjs.com/package/unicode-confusables

Now using this data we could check for any confusables and replace with the normal one, or alternatively just replace with some glyth to show it's a confusable, this is probably the best option as it lets the user know the person posting the message is maybe someone not to trust.

There is also a CDN version I've used in the below snippet.

Example..

const tests = [
 "оs.ca",
 "Comment modifier mes paramètres de messagerie sur http://www.abcоs.ca"
];


async function run() {
  const f = await fetch('https://cdn.jsdelivr.net/npm/[email protected]/confusables.json');
  const confusables = await f.json();
  
  function sanitize(a, show) {
    const chars = [...a];
    for (let l = 0; l < chars.length; l += 1) {
      const confused = confusables[chars[l]];
      if (confused !== undefined) {
         if (show) chars[l] = '🚫'; 
         else chars[l] = confused;
      }      
    }
    return chars.join('');
  }
  
  console.log('show confusables');
  for (const test of tests) 
    console.log(sanitize(test, true));
    
  console.log('replace with none confusables');
  for (const test of tests) 
    console.log(sanitize(test, false));  

}



run();

Upvotes: 1

Related Questions