CodeKingPlusPlus
CodeKingPlusPlus

Reputation: 16081

Java regular expression escaped commas

I have a csv file that I would like to use the String split() method on. I want each element of the array returned by split() to be the comma separated values in the csv. However, there are other commas in the csv file.

Fortunately, these other commas are escaped like '\,'

I am having trouble getting the right regex for the split() method. I want to split by commas that are not preceded by the escape character.

My current code is:

String[] columns = new String[CONST];
columns = someString.split("*^\\,*");

To me this says: split by a comma but the character before the comma must not be the escape character. Any number of characters before or after the comma are allowed.

  1. How do I get the correct regular expression?

Upvotes: 1

Views: 18012

Answers (3)

user1133275
user1133275

Reputation: 2735

The correct way is to use a parser (to deal with \\, \, ,) but using a simple regex can work;

jshell> "a,b".split("(?!\\\\),")
$2 ==> String[2] { "a", "b" }

How to test things that don't work;

jshell> "a,b".split("[^\\\\],")
$1 ==> String[2] { "", "b" }

and

jshell> "a,b".split("*^\\,*")
|  java.util.regex.PatternSyntaxException thrown: Dangling meta character '*' near index 0
*^\,*
^
|        at Pattern.error (Pattern.java:1997)
|        at Pattern.sequence (Pattern.java:2172)
|        at Pattern.expr (Pattern.java:2038)
|        at Pattern.compile (Pattern.java:1760)
|        at Pattern.<init> (Pattern.java:1409)
|        at Pattern.compile (Pattern.java:1065)
|        at String.split (String.java:2307)
|        at String.split (String.java:2354)
|        at (#6:1)

Upvotes: 0

EngineerWithJava54321
EngineerWithJava54321

Reputation: 1225

Since I hit this page on a search, I will answer the question as stated and put the correct pattern (and for completeness):

columns = someString.split("[^\\\\],");

Note that you need 4 escape characters because you need 2 escape characters to create 1 escape character in a string. In other words, "\\" creates the string \ . So "\\\\" creates the string \\, which escapes the escape in the regex to create the char \ in the regex. Therefore you need 4 escape characters in a string to create one in a regex. The brackets and the carat are one way to make a not statement (specifically for a single character).

You can also surround CSV entries that you don't want to split with quotes. Then use the following solution: Java: splitting a comma-separated string but ignoring commas in quotes.

My personal preference would be to use split over a 3rd party parser because of the environment I code in.

Upvotes: 0

Adrian Shum
Adrian Shum

Reputation: 40036

First, comma doesn't have special meaning at the position you are using, therefore you can omit the escape

The biggest problem in your regex is, * alone doesn't give you any meaning. * means any occurrence of previous token.

So the regex should be

.*,.* (I think escaping the comma should still be fine .*\,.* )

Then, come to usage, you are using the regex in String.split(). String.split() expect for the regex for the delimiter. Therefore you should only pass a , as regex. Having .*,.* as "delimiter" is going to give you unexpected result (You may have a try).

Upvotes: 1

Related Questions