Tatiana
Tatiana

Reputation: 43

Regex composion

I want to parse a line from a CSV(comma separated) file, something like this:

Bosh,Mark,[email protected],"3, Institute","83, 1, 2",1,21

I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:

Bosh,Mark,[email protected],"3; Institute","83; 1; 2",1,21

I use the following Java code but it doesn't parse it well:

Pattern regex = Pattern.compile("(\"[^\\]]*\")");
        Matcher matcher = regex.matcher(line);
        if (matcher.find()) {
            String replacedMatch = matcher.group();
            String gr1 = matcher.group(1);
            gr1.trim();
            replacedMatch = replacedMatch.replace(",", ";");
            line = line.replace(matcher.group(), replacedMatch);
        }

the output is:

Bosh,Mark,[email protected],"3; Institute";"83; 1; 2",1,21

anyone have any idea how to fix this?

Upvotes: 1

Views: 125

Answers (6)

Bart Kiers
Bart Kiers

Reputation: 170158

Here's a way:

import java.util.regex.*;

class Main {

  public static void main(String[] args) {

    String in = "Bosh,Mark,[email protected],\"3, \"\" Institute\",\"83, 1, 2\",1,21";
    String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
    Matcher matcher = Pattern.compile(regex).matcher(in);
    StringBuilder out = new StringBuilder();

    while(matcher.find()) {
      out.append(matcher.group().replace(',', ';')).append(',');
    }

    out.deleteCharAt(out.length() - 1);
    System.out.println(in + "\n" + out);
  }
}

which will print:

Bosh,Mark,[email protected],"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,[email protected],"3; "" Institute","83; 1; 2",1,21

Tested on Ideone: http://ideone.com/fCgh7

Upvotes: 2

nhahtdh
nhahtdh

Reputation: 56809

This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.

// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",[email protected],\"3, Institute\",\"83, 1, 2\",1,21";

Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);

int start = 0;

StringBuilder output = new StringBuilder();

while (matcher.find()) {
  // System.out.println(m.group() + "\n " + m.start() + " " + m.end());
  output
    .append(line.substring(start, matcher.start())) // Append unrelated contents
    .append(matcher.group().replaceAll(",", ";")); // Append replaced string

  start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents

// System.out.println(output);

Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.

Upvotes: 3

Sunil Chavan
Sunil Chavan

Reputation: 3004

Here is the what you need

String line = "Bosh,Mark,[email protected],\"3, Institute\",\"83, 1, 2\",1,21";
    Pattern regex = Pattern.compile("(\"[^\"]*\")");
    Matcher matcher = regex.matcher(line);
    while(matcher.find()){
        String replacedMatch = matcher.group();
        String gr1 = matcher.group(1);
        gr1.trim();
        replacedMatch = replacedMatch.replace(",", ";");
        line = line.replace(matcher.group(), replacedMatch);
    }

line will have value you needed.

Upvotes: 1

SF Lee
SF Lee

Reputation: 1777

Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:

Pattern regex = Pattern.compile("(\"[^\"]*\")");

Of course, this is assuming you can't have quotes in the quoted values of your input line.

Upvotes: 0

marc82ch
marc82ch

Reputation: 1004

Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).

"(\"[^\\]]*\")"

should be

"(\"[^\"]*\")"

But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns. I'm sure you'll find a parser when googling "Java CSV parser".

Upvotes: 0

Silviu Burcea
Silviu Burcea

Reputation: 5348

Have you tried to make the RegExp lazy? Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.

Upvotes: 0

Related Questions