Woot4Moo
Woot4Moo

Reputation: 24316

Javascript validate user input against desired character set (encoding)

The scenario is as follows:

User copies text from web site using Win-1252 encoding for its character set. This text is then sent to a database that I control with a character set of ISO-8859-1(this is a subset of Win-1252). Is there a mechanism within Javascript to inform the user that they are trying to insert "invalid" characters into the system? Preference if it can highlight said characters.

The general form of this problem is that a system A(sending system) has a Set of encodings defined as AsubE and a different system B(the accepting system) has a set of encodings defined as BsubE. When BsubE is inside the universe of AsubE it is not a problem. The question is about when BsubE is not a subset of AsubE how can we validate the input from the user.

Upvotes: 4

Views: 2495

Answers (2)

pimvdb
pimvdb

Reputation: 154818

Since some characters are not defined in the subset, you could use a regular expression to define those intervals:

function isNotAllowed(char) {
    return /\x00-\x1f|\x7f-\x9f/.test(char); // 00 to 1f, or 7f to 9f
}

To also highlight characters it will become more complicated but this function could be the core of it.

Upvotes: 3

Anthony Mills
Anthony Mills

Reputation: 8784

There is no facility in JavaScript to do this. Luckily, neither Windows-1252 or ISO-8859-1 is a variable-width encoding, so you could write something in, say, .NET or something that does understand character encodings to make a regular expression to test this.

For instance, in .NET, you could make a byte array with 256 bytes, one for each character, and then use each encoding to get the appropriate string. Figure out the differences in those strings, encode them into a regular expression, and there you go.

Upvotes: 1

Related Questions