Martin Tuskevicius
Martin Tuskevicius

Reputation: 2630

Java: reading text from a file results with strange formatting

Usually, when I read text files, I do it like this:

 File file = new File("some_text_file.txt");
 Scanner scanner = new Scanner(new FileInputStream(file));
 StringBuilder builder = new StringBuilder();
 while(scanner.hasNextLine()) {
     builder.append(scanner.nextLine());
     builder.append('\n');
 }
 scanner.close();
 String text = builder.toString();

There may be better ways, but this method has always worked for me perfectly.

For what I am working on right now, I need to read a large text file (over 700 kilobytes in size). Here is a sample of the text when opened in Notepad (the one that comes standard with any Windows operating system):

"lang"
{
    "Language"      "English"
    "Tokens"
    {
        "DOTA_WearableType_Daggers"     "Daggers"
        "DOTA_WearableType_Glaive"      "Glaive"
        "DOTA_WearableType_Weapon"      "Weapon"
        "DOTA_WearableType_Armor"       "Armor"

However, when I read the text from the file using the method that I provided above, the output is:

Sample output

I could not paste the output for some reason. I have also tried to read the file like so:

 File file = new File("some_text_file.txt");
 Path path = file.toPath();
 String text = new String(Files.readAllBytes(path));

... with no change in result.

How come the output is not as expected? I also tried reading a text file that I wrote and it worked perfectly fine.

Upvotes: 1

Views: 370

Answers (2)

Pratik Shelar
Pratik Shelar

Reputation: 3212

final Scanner scanner = new Scanner(new FileInputStream(file), "UTF-16");

Upvotes: 1

MartinTeeVarga
MartinTeeVarga

Reputation: 10908

It looks like encoding problem. Use a tool that can detect encoding to open the file (like Notepad++) and find how it is encoded. Then use the other constructor for Scanner:

Scanner scanner = new Scanner(new FileInputStream(file), encoding);

Or you can simply experiment with it, trying different encodings. It looks like UTF-16 to me.

Upvotes: 2

Related Questions