alortimor
alortimor

Reputation: 355

regular expression on a csv file in Java

I have to identify lines from a CSV file that match a certain search criteria. The data in the CSV file looks somethin like this:

Wilbur Smith,Elephant Song,McMillain,1992,1
Wilbur Smith,Birds of Prey,McMillain,1992,1
George Orwell,Animal Farm,Secker & Warburg,1945,1
George Orwell,1984,Secker & Warburg,1949,1

The search criteria is like this:

Orwell,,,,
,Elephant,,,

The first line identifies 2 lines, the second 1 line. I'm currently reading the file as follows, but not using the criteria above.

br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
    String[] dataItems = line.split(cvsSplitBy);

    if (dataItems[0].contains(title) && dataItems[1].contains(author) && dataItems[2].contains(publisher)) {
        bk[i++] = line;
        if (bk.length > 4) {break;}
    }
}

I am adding to a fixed size array. How can I use the criteria as a regular expression to identify a line?

Upvotes: 0

Views: 1149

Answers (1)

tima
tima

Reputation: 1513

Seems like I'm in a minority here :) but here is a version using a regex in case you are interested.

BufferedReader br = null;

String[] searches = new String[]{
            ",Animal Farm,Secker & Warburg,,",
            ",,Secker & Warburg,,",
            "George Orwell,,,,1",
            "Wilbur Smith,,,,",
            ",,,,1",
            "random,,,,1",
            "WILBUR SMITH,Birds of PREY,mcmillain,1992,1",
            ",,,,"
};

try {
    br = new BufferedReader(new FileReader("file.txt"));
    String line = null;

    // to store results of matches for easier output
    String[] matchResult = new String[searches.length];

    while ((line = br.readLine()) != null) {
        // go through all searches
        for (int i = 0; i < searches.length; i++) {

            /*
             *  replace all commas that don't have letters or numbers on both 
             *  sides with a new regex to match all characters
             */
            String searchPattern = searches[i].replaceAll("(?<![a-zA-z0-9])\\,|\\,(?![a-zA-z0-9\\,])", ".*,.*");

            // do the match on the line
            Matcher m = Pattern.compile("^" + searchPattern + "$", Pattern.CASE_INSENSITIVE).matcher(line);

            // store the result
            matchResult[i] = m.matches() == true ? "matches" : "no match";
        }

        System.out.println(String.format("%-50s %-10s %-10s %-10s %-10s %-10s %-10s %-10s", line, 
                    matchResult[0], matchResult[1], matchResult[2], matchResult[3], matchResult[4], matchResult[5], matchResult[6], matchResult[7]));
    }
} catch (Exception e) {
        e.printStackTrace();
} finally {
    try {
        br.close();
    } catch (IOException e) {}
}

Output

Wilbur Smith,Elephant Song,McMillain,1992,1        no match   no match   no match   matches    matches    no match   no match  
Wilbur Smith,Birds of Prey,McMillain,1992,1        no match   no match   no match   matches    matches    no match   matches   
George Orwell,Animal Farm,Secker & Warburg,1945,1  matches    matches    matches    no match   matches    no match   no match  
George Orwell,1984,Secker & Warburg,1949,1         no match   matches    matches    no match   matches    no match   no match 

Upvotes: 1

Related Questions