Reputation: 59
I'm having trouble around reading from a text file into a String in Java. I have a text file (created in Eclipse, if that matters) that contains a short amount of text -- approximately 98 characters. Reading that file to a String via several methods results in a String that is quite a bit longer -- 1621 characters. All but the relevant 98 are invisible in the debugger/console.
I've tried the following methods to load the String:
apache commons-io:
FileUtils.readFileToString(new File(path));
FileUtils.readFileToString(new File(path), "UTF-8");
byte[] b = FileUtils.readFileToByteArray(new File(path);
new String(b, "UTF-8");
byte[] b = FileUtils.readFileToByteArray(new File(path);
Charset.defaultCharset().decode(ByteBuffer.wrap(bytes)).toString();
NIO:
new String(Files.readAllBytes(path);
And so on.
Is there a method to strip away these control chars? Is there a way to read files to strings that doesn't have this issue?
As noted in the comments below, this behavior is due to a corrupted(?) file generated by Eclipse. I'd still be interested in hearing any strategies for trimming away control characters from Strings, though!
Upvotes: 0
Views: 4048
Reputation: 424973
If you want to strip out all non-printable characters, try this
str = str.replaceAll("[^\\p{Graph}\n\r\t ]", "");
The regex matches all "invisible" characters, except ones we want to keep; in this case newline chars, tabs and spaces.
\p{Graph}
is a POSIX character class for all printable/visible characters. To negate a POSIX character class, we can use capital P
, ie P{Graph}
(all non-printable/invisible characters), however we need to not exclude newlines etc, so we need [^\\p{Graph}\n\r\t]
.
Upvotes: 4
Reputation: 30126
Read it line by line into a StringBuilder, and then convert it to a String:
StringBuilder sb = new StringBuilder();
BufferedReader file = new BufferedReader(new FileReader(fileName));
while (true)
{
String line = file.readLine();
if (line == null)
break;
sb.append(line+"\n");
}
file.close();
return sb.toString();
Upvotes: 0