Reputation: 2050
I have two different sources feeding input files to my application. Their filename patterns differ, yet they contain common information that I want to retrieve.
Using regex named groups seemed convenient, as it allows for maximum code factorization, however it has its limits, as I cannot concat the two patterns if they use the same group names.
In other words, this:
String PATTERN_GROUP_NAME = "name";
String PATTERN_GROUP_DATE = "date";
String PATTERN_IMPORT_1 = "(?<" + PATTERN_GROUP_NAME + ">[a-z]{3})_(?<" + PATTERN_GROUP_DATE + ">[0-9]{14})_(stuff stuf)\\.xml";
String PATTERN_IMPORT_2 = "(stuff stuf)_(?<" + PATTERN_GROUP_DATE + ">[0-9]{14})_(?<" + PATTERN_GROUP_NAME + ">[a-z]{3})_(other stuff stuf)\\.xml";
Pattern universalPattern = Pattern.compile(PATTERN_IMPORT_1 + "|" + PATTERN_IMPORT_2);
try {
DirectoryStream<Path> list = Files.newDirectoryStream(workDirectory);
for (Path file : list) {
Matcher matcher = universalPattern.matcher(file.getFileName().toString());
name = matcher.group(PATTERN_GROUP_NAME);
fileDate = dateFormatter.parseDateTime(matcher.group(PATTERN_GROUP_DATE));
(...)
will fail with a java.util.regex.PatternSyntaxException
because the named capturing groups are already defined.
What would be the most efficient / elegant way of solving this problem?
Edits:
It goes without saying, but the two patterns I can match my input files against are different enough so no input file can match both.
Upvotes: 2
Views: 762
Reputation: 271
Agree with Joop Eggen's opinion. Two patterns are simple & easily maintainable.
Just for fun, and give you one pattern implementation for your specific case. (a liitle bit longer & ugly.)
String[] inputs = {
"stuff stuf_20111130121212_abc_other stuff stuf.xml",
"stuff stuf_20111130151212_def_other stuff stuf.xml",
"abc_20141220202020_stuff stuf.xml",
"def_20140820202020_stuff stuf.xml"
};
String lookAhead = "(?=([a-z]{3}_[0-9]{14}_stuff stuf\\.xml)|(stuff stuf_[0-9]{14}_[a-z]{3}_other stuff stuf\\.xml))";
String onePattern = lookAhead
+ "((?<name>[a-z]{3})_(other stuff stuf)?|(stuff stuf_)?(?<date>[0-9]{14})_(stuff stuf)?){2}\\.xml";
Pattern universalPattern = Pattern.compile(onePattern);
for (String input : inputs) {
Matcher matcher = universalPattern.matcher(input);
if (matcher.find()) {
//System.out.println(matcher.group());
String name = matcher.group("name");
String fileDate = matcher.group("date");
System.out.println("name : " + name + " fileDate: "
+ fileDate);
}
}
The output:
name : abc fileDate: 20111130121212
name : def fileDate: 20111130151212
name : abc fileDate: 20141220202020
name : def fileDate: 20140820202020
Actually, in your case, the "lookAhead" is not necessary. Since in one pattern, you can't assign two goups with the same name. Therefore, normally, you need to revise your pattern.
From AB|BA ---> (A|B){2}
Upvotes: 1
Reputation: 109593
Use two patterns - then group names can be equal.
You asked for efficient and elegant. Theoretical one pattern could be more efficient, but that is irrelevant here.
First: the code will be slightly longer, but better readable - a weakness of regex. That makes it better maintainable.
In pseudo-code:
Matcher m = firstPattern.matcher ...
if (!m.matches()) {
m = secondPattern.matcher ...
if (!m.matches()) {
continue;
}
}
name = m.group(NAME_GROUP);
...
(Everyone want to do too clever coding, but simplicity may be called for.)
Upvotes: 2