yellavon
yellavon

Reputation: 2881

Detect invalid character

I am reading a tab delimited text file (exported from excel) into my java application but it is possible for the file to contain invalid characters that I don't want. For example, in a spreadsheet I have seen these characters show up (I do not generate the spreadsheet):

�

Which when tab delimited show up as:

This is apparently the Unicode Character 'REPLACEMENT CHARACTER'. How do I detect this character in my Java string so I can abort the import?

String invalidString = "1234 � test2"

Upvotes: 0

Views: 2007

Answers (2)

Arkillon
Arkillon

Reputation: 49

You could create a regex with all your 'valid' characters like :

String regexValidCharacters = "[A-Za-z0-9]*";

and do something like :

if(invalidString.replaceAll(regexValidCharacters, "").length() > 0)
    ABORT!

There is probably a better solution but that should work fine.. ^^

Upvotes: 0

tilpner
tilpner

Reputation: 4328

The answer to this question depends on what you understand as invalid characters.

ASCII truncation

A simple check would be to check if the code point lies within a certain range. The lowest printable character is a space. It's decimal value is 32. The highest ASCII character is ~ with a decimal value of 126. This would truncate it to the range of printable ASCII characters, which is bad for anyone using accents or similar.

Printability

Another approach is to check whether a character is printable for a certain font. You can use the java.awt.Font class for that. It provides a method canDisplay, which returns if the font has a glyph to display that very character. This could work, but feels really awful. But this could be what you want, we can't know.

Valid letter or number

Another criteria might be if the letter is a valid letter or number. The java.lang.Character class provides the method isLetter and isDigit to determine this.

Charsets

We all know TANSTAPT, so you might well have used the wrong charset. Find out if you're using the same charset as Excel.

If these criterias don't fit your intent, you'll have to further specify your needs.

Upvotes: 1

Related Questions