Reputation:
This sample data is returned by Web Service
200,6, "California, USA"
I want to split them using split(",")
and tried to see the result using simple code.
String loc = "200,6,\"California, USA\"";
String[] s = loc.split(",");
for(String f : s)
System.out.println(f);
Unfortunately this is the result
200
6
"California
USA"
The expected result should be
200
6
"California, USA"
I tried different regular expressions and no luck. Is it possible to escape the given regular expression inside of ""
?
UPDATE 1: Added C# Code
UPDATE 2: Removed C# Code
Upvotes: 7
Views: 249
Reputation: 77
Hello Try this Expression.
public class Test {
/**
* @param args
*/
public static void main(String[] args) {
String loc = "200,6,\"Paris, France\"";
String[] str1 =loc.split(",(?=(?:[^\"]|\"[^\"]*\")*$)");
for(String tmp : str1 ){
System.out.println(tmp);
}
}
}
Upvotes: 0
Reputation: 3068
An easier solution might be to use an existing library, such as OpenCSV to parse your data. This can be accomplished in two lines using this library:
CSVParser parser = new CSVParser();
String [] data = parser.parseLine(inputLine);
This will become especially important if you have more complex CSV values coming back in the future (multiline values, or values with escaped quotes inside an element, etc). If you don't want to add the dependency, you could always use their code as a reference (though it is not based on RegEx)
Upvotes: 2
Reputation: 21773
If there's a good lexer/parser library for Java, you could define a lexer like the following pseudo-lexer code:
Delimiter: ,
Item: ([^,"]+) | ("[^,"]+")
Data: Item Delimiter Data | Item
How lexers work is that it starts at the top level token definition (in this case Data) and attempts to form tokens out of the string until it cannot or until the string is all gone. So in the case of your string the following would happen:
(I learned about how lexers work from the guide to PLY, a Python lexer/parser: http://www.dabeaz.com/ply/ply.html )
Upvotes: 0
Reputation: 131
,(?=(?:[^"]|"[^"]*")*$)
This is the regex you want (To put it in the split function you'll need to escape the quotes in the string)
Explanation
You need to find all ','s not in quotes.. That is you need lookahead (http://www.regular-expressions.info/lookaround.html) to see whether your current matching comma is within quotes or out.
To do that we use lookahead to basically ensure the current matching ',' is followed by an EVEN number of '"' characters (meaning that it lies outside quotes)
So
(?:[^"]|"[^"]*")*$
means match only when there are non quote characters till the end OR a pair of quotes with anything in between them
(?=(?:[^"]|"[^"]*")*$)
will lookahead for the above match
,(?=(?:[^"]|"[^"]*")*$)
and finally this will match all ',' with the above lookahead
Upvotes: 3