Reputation: 2510
I need to be able to take a string in Java and determine whether or not all of the characters contained within it are in a specified character set (e.g. ISO-8859-1). I've looked around quite a bit for a simple way to do this (including playing around with a CharsetDecoder
), but have yet to be able to find something.
What is the best way to take a string and determine if all the characters are within a given character set?
Upvotes: 22
Views: 8504
Reputation: 14873
Class CharsetEncoder in package java.nio.charset offer a method canEncode to test if a specific character is supported.
Michael basically did something like this:
Charset
.forName
( CharEncoding.ISO_8859_1
).newEncoder
().canEncode
("string")
Note that CharEncoding.ISO_8859_1
rely on Apache commons and may be replaced by "ISO_8859_1".
Upvotes: 32
Reputation: 234847
I think that the easiest way will be to have a table of which Unicode characters can be represented in the target character set encoding and then testing each character in the string. For the ISO-8859 family, the table can usually be represented by one or a few ranges of Unicode characters, making the test relatively easy. It's a lot of hand work, but needs to be done only once.
EDIT: or use Aubin's answer if the charset is supported in your Java implementation. :)
Upvotes: 2