abhishekd
abhishekd

Reputation: 133

How to check if a string has any non ISO-8859-1 characters with Javascript?

I want to write a string validator (or regex) for ISO-8859-1 characters in Javascript.

If a string has any non ISO-8859-1 character, then validator must return false otherwise true. E.g:

str = "abcÂÃ";
validator(str); // should return true;

str = "a 你 好";
validator(str); // should return false;

str ="你 好";
validator(str); // should return false;

I have tried to use the following regex but it's not working perfectly.

var regex = /^[\u0000-\u00ff]+/g;
var res = regex.test(value);

Upvotes: 10

Views: 14430

Answers (2)

iegik
iegik

Reputation: 1477

Just in case you want to have an alternative way 😉...

ISO-8859-1 - for the Unicode block also called "Latin 1" https://en.wikipedia.org/wiki/ISO/IEC_8859-1

So, let try use some native function, that uses latin1 only input...

Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data https://developer.mozilla.org/en-US/docs/Web/API/btoa

const validator = (str) => {
  try {
    btoa(str)
    return true;
  } catch () {
    return false;
  }
}

btoa will throw following error:

Uncaught DOMException: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range. at :1:1

See also: JavaScript has a Unicode problem

Upvotes: 1

Buzinas
Buzinas

Reputation: 11733

Since you want to return false if any non-ISO-8859-1 character is present, you could use double-negate:

var str = "abcÂÃ";
console.log(validator(str)); // should return true;

str = "a 你 好";
console.log(validator(str)); // should return false;

str = "你 好";
console.log(validator(str)); // should return false;

str = "abc";
console.log(validator(str)); // should return true;

str = "╗";
console.log(validator(str)); // should return false;

function validator(str) {
  return !/[^\u0000-\u00ff]/g.test(str);
}

It uses !/[^\u0000-\u00ff]/g.test(str), since it checks if there is any non-character, and if it has not, it returns true, otherwise, it returns false.

Upvotes: 10

Related Questions