Reputation: 351
I am struggling to prepare regex for the following data
I have input like this
"%,.2f","mm/DD/YYYY","1"
I wanted to have a result like this
%,.2f
mm/DD/YYYY
1
I tried with multiple Regex but nothing works.
Is there a way to get this in Java?
I am writing a parser in a internal framework which parse method and arguments like formatCurrency("%,.2f","mm/DD/YYYY","1")
. I have written a regex to get function name and arguments separately.
There are some constraints for using comma splitter because function parameters can also contain a comma. I think even splitting with quotes will have the same problem. I thought the only way is to parse using regex but understanding regex is difficult...
Regex to parse this would be more helpful
Upvotes: 1
Views: 1126
Reputation: 21975
You could just use the following which contains the three scenarios : begin of string, middle of string, end of string
(^"|","|"$)
^"
will match the ones at the beginning of your string","
will match de middle ones"$
will match the ones at the endResult : [, %,.2f, mm/DD/YYYY", 1]
Upvotes: 0
Reputation: 7576
You could use a Matcher
and find()
each column:
String s = "\"%,.2f\",\"mm/DD/YYYY\",\"1\"";
Matcher m = Pattern.compile("(?<=(?:^|,)\")([^\"]*)(?=\")").matcher(s);
List<String> cols = new ArrayList<>();
while (m.find()) {
cols.add(m.group(1)); // group(0) works, too
}
System.out.println(cols);
// [%,.2f, mm/DD/YYYY, 1]
It's also somewhat easy to make the surrounding quotes optional, but there's a joke about regular expressions being write-only for a reason.
A bit of an explanation on the regex:
All quotes have to be escaped because of Java strings, so you'll see things like [^\"]*
in the pattern.
(?<=(?:^|,)\")
matches the comma and quote before the text
(?<=...)
Non-capturing positive look-behind. It lets you look behind the text you're trying to match, even if those characters were already matched in a previous pattern. It also means group(0)
won't contain the comma or quote, so it's more fool-proof.
(?:^|,)\"
Match either the beginning of a line or a comma, followed up a quote, but don't capture the comma (again, for group(0)
to work, and for group(1)
to not be "
"`.
([^\"]*)
Match as many non-quote characters as possible and capture them. They'll be in group(1)
since this is the first capturing group in the pattern.
(?=\")
Look ahead for a closing quote. This won't be included in group(0)
because it's look-ahead.
Upvotes: 0
Reputation: 44368
You want to split the String between two quotation marks "
with the comma ,
as a delimiter.
This Regex captures the needed Strings in case you keep the format:
"(.*?)"
Here is the same in Java code that might be better for you. Don't forget to escape \"
the quotation marks. It would be understood as the String end/start otherwise:
List<String> results = new ArrayList<>();
Matcher m = Pattern.compile("\"(.*?)\"") .matcher(input);
while (m.find()) {
results.add(m.group(1));
}
Upvotes: 1
Reputation: 12838
I don't think it's a good idea to try to parse a CSV file by yourself. The format has a lot of corner cases and for serious products only I recommend you to use an existing library.
I recommend you to use Apache COMMONS CSV:
Just add the dependency in your POM file:
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.1</version>
</dependency>
</dependencies>
The code is quite straight forward:
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import java.io.IOException;
import java.util.List;
/**
*/
public class CSVTester
{
public static void main(String... args) throws IOException
{
String csvLine = "\"%,.2f\",\"mm/DD/YYYY\",\"1\"";
List<CSVRecord> records = CSVParser.parse(csvLine, CSVFormat.DEFAULT).getRecords();
records.stream().forEach(record -> {
System.out.printf("%s\n%s\n%s",
record.get(0), record.get(1), record.get(2));
});
}
}
The output should be as expected:
%,.2f
mm/DD/YYYY
1
Also I only go to RegEx when there's nothing else left in my arsenal.
The code doesn't look right, they can hide a lot of corner-case bugs, and they are a nightmare to debug and to repair (after a few weeks you will forget how you made the RegEx and you will spend a lot of time trying to re-understand it).
Upvotes: 0