egorulz
egorulz

Reputation: 1505

How to check for and replace non UTF-8 characters in tcl?

What's the best way to search if a given string contains non UTF-8 characters in tcl? Is regexp'ing "^[\x00-\x7f]+$" the only way forward?

I'm trying to write a tcl proc to check if a given variable contains non UTF-8 characters and if it does replace it with "Not supported"

Upvotes: 2

Views: 2394

Answers (1)

Donal Fellows
Donal Fellows

Reputation: 137587

All Tcl's characters are Unicode characters.

OK, that's not helpful. You actually appear to be asking about non-ASCII characters. Supposing you wanted to replace each non-ASCII character with a ?, you might use a regular expression substitution, like this:

regsub -all {[\u0080-\uffff]} $inputString "?" outputString

The key here is that the RE is in braces (virtually always strongly recommended) and that we're using \uXXXX escape sequences (which the RE engine also understands). That'll put many ?s in potentially, but I'm sure you can adjust.

Upvotes: 3

Related Questions