While reading a CSV I get a question mark at the beginning

I'm trying to do a small school practice about Java Text I/O and while trying to read a CSV file with name prefixes (a Dutch thing) and surnames I got a question mark in the beginning.

It's a small exercise where I need to add my code to an already existing project with 3 small files to practice the use of Text I/O, see project code: https://github.com/Remzi1993/klantenBestand

public void vulNamenLijst() {
    // TODO: Lees het bestand "resources/NamenlijstGroot.csv" en zet elke regel (<tussenvoegsel>,<achternaam>)
    // in de ArrayList namenLijst.

    file = new File("resources/NamenlijstGroot.csv");

    try (
            Scanner scanner = new Scanner(file);
    ) {
        while (scanner.hasNext()) {
            String line = scanner.nextLine();
            String[] values = line.split(",");
            String namePrefix = values[0];
            String surname = values[1];
            namenLijst.add(namePrefix + " " + surname);
        }
    } catch (FileNotFoundException e) {
        System.err.println("Data file doesn't exist!");
    } catch (Exception e) {
        System.err.println("Something went wrong");
        e.printStackTrace();
    }
}

I'm sorry for the use of Dutch and English at the same time in the code. I try to write my own code in English, but this code exercise already existed and I only needed to add some code with the //TODO to practice Text I/O.

This is what I get:

My CSV file:

Upvotes: 1

Answers (4)

tahirhasanov

Reputation: 21

Also if this issue appeared during send csv file, you must check encoding, because utf8 bom could be reason of it.

    if (text.startsWith("\uFEFF")) {
        text.substring(1);
    }

Upvotes: 0

Brian Agnew

Reputation: 272377

To mitigate the BOM using a 'standard' component, you can use Apache's BOMInputStream. Note that BOMs come in multiple flavours (see here for more details), and this should handle them all reliably.

If you have a sizeable project, you may find you have the BOMInputStream in your project already via commons-io

Scanner will take an input stream (see here)

Upvotes: 1

Remzi Cavdar

Reputation: 187

I found an easy solution:

final String UTF8_BOM = "\uFEFF";

if (line.startsWith(UTF8_BOM)) {
    line = line.substring(1);
}

A simple workable example:

File file = new File("resources/NamenlijstGroot.csv");

try (
    Scanner scanner = new Scanner(file, StandardCharsets.UTF_8);
) {
    while (scanner.hasNext()) {
        String line = scanner.nextLine().strip();

        final String UTF8_BOM = "\uFEFF";

        if (line.startsWith(UTF8_BOM)) {
            line = line.substring(1);
        }

        String[] values = line.split(",");
        String namePrefix = values[0];
        String surname = values[1];
        namenLijst.add(namePrefix + " " + surname);
    }
} catch (FileNotFoundException e) {
    System.err.println("Data file doesn't exist!");
} catch (Exception e) {
    System.err.println("Something went wrong");
    e.printStackTrace();
}

Upvotes: 0

Rob Audenaerde

Reputation: 20079

@funky is correct. Your file starts with a UTF8-BOM.

output of xxd:

00000000: efbb bf64 652c 4a6f 6e67 0a2c 4a61 6e73  ...de,Jong.,Jans
00000010: 656e 0a64 652c 5672 6965 730a 7661 6e20  en.de,Vries.van

The first three bytes are: ef bb bf

Upvotes: 2

While reading a CSV I get a question mark at the beginning

Answers (4)

Related Questions