Reputation: 223
I have some text like this:
//(10,0,'Computer_accessibility','',''),(13,0,'History_of_Afghanistan','',''),(14,0,'Geography_of_Afghanistan','','')
and I wrote a pattern:
public final static Pattern r_english = Pattern.compile("\\((.*?),(.*?),(.*?),(.*?),(.*?)\\)");
This works well in Java to extract m.group(1) (e.g. 13) and m.group(3) (e.g. History_of_Afghanistan) where m is a matcher. However, it breaks if the text is like this, since Washington,_D.C. (ie. m.group(3)) has a comma in it:
(8543,0,'Washington,_D.C.','',''),(8546,0,'Extermination_camp','','')
Can someone help me in with the regex to modify it and extract the Washington,_D.C. thingy? Thanks
Upvotes: 2
Views: 123
Reputation: 1884
You need to change your regular expression in order to fit all the matchings that you want to retrieve, E.g.:
/((.*?),(.*?),'(.*?)','(.*?)','(.*?)'\)/g
Working Example @ regex101
You need to translate/escape the above regular expression into a Java compatible one, E.g.:
public static String REGEX_PATTERN = "\\((.*?),(.*?),'(.*?)','(.*?)','(.*?)'\\)";
Then, iterate through all the matchings trying to mimic the //g
modifier, E.g.:
while (matcher.find()) {
}
Java Working Example:
package SO40002225;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static String INPUT;
public static String REGEX_PATTERN;
static {
INPUT = "(8543,0,'Washington,_D.C.','',''),(8546,0,'Extermination_camp','',''),(8543,0,'Washington,_D.C.','',''),(8546,0,'Extermination_camp','','')";
REGEX_PATTERN = "\\((.*?),(.*?),'(.*?)','(.*?)','(.*?)'\\)";
}
public static void main(String[] args) {
String text = INPUT;
Pattern pattern = Pattern.compile(REGEX_PATTERN);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String mg1 = matcher.group(1);
String mg2 = matcher.group(2);
String mg3 = matcher.group(3);
String mg4 = matcher.group(4);
String mg5 = matcher.group(5);
System.out.println("Matching group #1: " + mg1);
System.out.println("Matching group #2: " + mg2);
System.out.println("Matching group #3: " + mg3);
System.out.println("Matching group #4: " + mg4);
System.out.println("Matching group #5: " + mg5);
}
}
}
Removed the escape done for commas ,
with-in the regular expression, as pointed by Pshemo, the ,
is not a meta-character or it's not being used within a limit repetition quantifier: {min, max}
Upvotes: 1
Reputation: 11943
Change your third capture group to capture everything until a closing '
is reached. That allows every character (including your comma) to be captured.
UPDATE: to allow escaped '
s as well, the regex looks like this. Credits go to Pshemo, see the comments.
public final static Pattern r_english = Pattern.compile("\\((.*?),(.*?),('(?:[^']|\\')*'),(.*?),(.*?)\\)");
Upvotes: 3
Reputation: 11
You should help to make your RegEx more specific to your case. For example:
((.*?),(.*?),('.*?'),('.*?'),('.*?'))
I used the parantehesis '
, this solution is also agnostic to further parantehesis in Group 3-5.
Regards
Upvotes: 1