Reputation: 467
I am trying to parse a file name according to a given pattern but not able to perfect the match. Here is a sample file name:
CRS-ISAU-RPV#3430_Dedalus_Conc.ok.erto_AOTreviglio.doc
And here are my requirements:
til the character #
the file name can contain anything, after #
, i have to find character _
or the character -
to separate a string. The string in between the character(optionally _
or -
- but not both) can contain any other character. So eventually after the character #
i must have exactly three (3) _
or -
characters combined. The string should end with .doc
or .docx
or .odt
but NOT .ok.doc
or .ok.docx
or .ok.odt
.
Here is what i tried:
(.*)#([^_-]+)[_-]([^_-]+)[_-]([^_-]+)[_-]([^_-]+)\.[doc|odt|docx].*(?<!\.ok)$
But this forces me to end the string with .doc.ok
or .docs.ok
or .docx.ok
and actually i want to retain the file extension at the end.
If i try this:
(.*)#([^_-]+)[_-]([^_-]+)[_-]([^_-]+)[_-]([^_-]+)\..*(?<!ok\.[doc|odt|docx])$
it wont work.
Any help would be appreciated. Thank you :)
Upvotes: 1
Views: 1366
Reputation: 626689
It seems you can use
^([^#]*#[^-_]*)[-_](.*)$(?<=(?<!\.ok)\.(?:docx?|odt)$)
Explanation:
^
- start of string (not necessary when used with .matches()
, but not harmful)([^#]*#[^-_]*)
- Group 1: any 0+ characters other than #
([^#]*
) followed with #
and then any 0+ characters iother than -
and _
(with [-_]
)(.*)$
- match 0+ characters other than a newline (since DOTALL mode is not specified) up to the end of string BUT...(?<=(?<!\.ok)\.(?:docx?|odt)$)
- after reaching the end, check if there is .doc
or .docx
or .odt
at the end (see (?<=\.(?:docx?|odt)$)
) that are not preceded with .ok
(see (?<!\.ok)
). In PCRE, these conditions should be split, Java regex seems to cope with alternations inside the lookbehind.A lookahead-based alternative:
^([^#]*#[^-_]*)[-_](?=.*(?<!\.ok)\.(?:docx?|odt)$)(.*)$
See the regex101 demo. It is the same, but all the end-of-string checks are done after matching the -
or _
.
See the Java demo:
List<String> strs = Arrays.asList("CRS-ISAU-RPV#3430_Dedalus_Conc.ok.erto_AOTreviglio.doc",
"CRS-ISAU-RPV#3430_Dedalus_Conc.ok.erto_AOTreviglio.docx",
"CRS-ISAU-RPV#3430_Dedalus_Conc.ok.erto_AOTreviglio.odt",
"CRS-ISAU-RPV#3430_Dedalus_Conc.ok.erto_AOTreviglio.ok.docx",
"CRS-ISAU-RPV#3430_Dedalus_Conc.ok.erto_AOTreviglio.ok.odt"
);
for (String str : strs) {
System.out.println("----------\nMatching: " + str);
Matcher m = Pattern.compile("^([^#]*#[^-_]*)([-_])(.*)$(?<=(?<![.]ok)[.](?:docx?|odt)$)").matcher(str);
if (m.matches()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
} else { System.out.println("No match"); }
}
Upvotes: 2