suresh
suresh

Reputation: 351

Splitting data inside quotes and comma using regex

I am struggling to prepare regex for the following data

I have input like this

"%,.2f","mm/DD/YYYY","1"

I wanted to have a result like this

%,.2f
mm/DD/YYYY
1

I tried with multiple Regex but nothing works.

Is there a way to get this in Java?

I am writing a parser in a internal framework which parse method and arguments like formatCurrency("%,.2f","mm/DD/YYYY","1"). I have written a regex to get function name and arguments separately.

There are some constraints for using comma splitter because function parameters can also contain a comma. I think even splitting with quotes will have the same problem. I thought the only way is to parse using regex but understanding regex is difficult...

Regex to parse this would be more helpful

Upvotes: 1

Views: 1126

Answers (4)

Yassin Hajaj
Yassin Hajaj

Reputation: 21975

You could just use the following which contains the three scenarios : begin of string, middle of string, end of string

(^"|","|"$)

Demo

  • ^" will match the ones at the beginning of your string
  • "," will match de middle ones
  • "$ will match the ones at the end

IdeOne Demo

Result : [, %,.2f, mm/DD/YYYY", 1]

Upvotes: 0

David Ehrmann
David Ehrmann

Reputation: 7576

You could use a Matcher and find() each column:

String s = "\"%,.2f\",\"mm/DD/YYYY\",\"1\"";
Matcher m = Pattern.compile("(?<=(?:^|,)\")([^\"]*)(?=\")").matcher(s);
List<String> cols = new ArrayList<>();
while (m.find()) {
    cols.add(m.group(1)); // group(0) works, too
}

System.out.println(cols);
// [%,.2f, mm/DD/YYYY, 1]

It's also somewhat easy to make the surrounding quotes optional, but there's a joke about regular expressions being write-only for a reason.

A bit of an explanation on the regex:

All quotes have to be escaped because of Java strings, so you'll see things like [^\"]* in the pattern.

(?<=(?:^|,)\")

matches the comma and quote before the text

(?<=...)

Non-capturing positive look-behind. It lets you look behind the text you're trying to match, even if those characters were already matched in a previous pattern. It also means group(0) won't contain the comma or quote, so it's more fool-proof.

(?:^|,)\"

Match either the beginning of a line or a comma, followed up a quote, but don't capture the comma (again, for group(0) to work, and for group(1) to not be ""`.

([^\"]*)

Match as many non-quote characters as possible and capture them. They'll be in group(1) since this is the first capturing group in the pattern.

(?=\")

Look ahead for a closing quote. This won't be included in group(0) because it's look-ahead.

Upvotes: 0

Nikolas
Nikolas

Reputation: 44368

You want to split the String between two quotation marks " with the comma , as a delimiter.

This Regex captures the needed Strings in case you keep the format:

"(.*?)"

Demo at Regex101

Here is the same in Java code that might be better for you. Don't forget to escape \" the quotation marks. It would be understood as the String end/start otherwise:

List<String> results = new ArrayList<>();
Matcher m = Pattern.compile("\"(.*?)\"") .matcher(input);
while (m.find()) {
    results.add(m.group(1));
}

Upvotes: 1

Andrei Ciobanu
Andrei Ciobanu

Reputation: 12838

I don't think it's a good idea to try to parse a CSV file by yourself. The format has a lot of corner cases and for serious products only I recommend you to use an existing library.

I recommend you to use Apache COMMONS CSV:

Just add the dependency in your POM file:

<dependencies>
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-csv</artifactId>
        <version>1.1</version>
    </dependency>
</dependencies>

The code is quite straight forward:

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;

import java.io.IOException;
import java.util.List;

/**
 */
public class CSVTester
{
    public static void main(String... args) throws IOException
    {
        String csvLine = "\"%,.2f\",\"mm/DD/YYYY\",\"1\"";

        List<CSVRecord> records = CSVParser.parse(csvLine, CSVFormat.DEFAULT).getRecords();

        records.stream().forEach(record -> {
            System.out.printf("%s\n%s\n%s",
                    record.get(0), record.get(1), record.get(2));
        });
    }
}

The output should be as expected:

%,.2f
mm/DD/YYYY
1

Also I only go to RegEx when there's nothing else left in my arsenal.

The code doesn't look right, they can hide a lot of corner-case bugs, and they are a nightmare to debug and to repair (after a few weeks you will forget how you made the RegEx and you will spend a lot of time trying to re-understand it).

Upvotes: 0

Related Questions