Reputation: 37710
In a c# program, I want to write file, in a folder where other file may exists. If so, a suffix may be added to the file myfile.docx
, myfile (1).docx
, myfile (2).docx
and so on.
I'm struggling at analysing existing file name to extract existing files' name parts.
Especially, I use this regex: (?<base>.+?)(\((?<idx>\d+)\)?)?(?<ext>(\.[\w\.]+))
.
This regex outputs:
╔═══════════════════════╦══════════════╦═════╦═══════════╦═══════════════════════════════════╗
║ Source Filename ║ base ║ idx ║ extension ║ Success ║
╠═══════════════════════╬══════════════╬═════╬═══════════╬═══════════════════════════════════╣
║ somefile.docx ║ somefile ║ ║ .docx ║ Yes ║
║ somefile ║ ║ ║ ║ No, base should be "somefile" ║
║ somefile (6) ║ ║ ║ ║ No, base should be "somefile (6)" ║
║ somefile (1).docx ║ somefile ║ 1 ║ .docx ║ Yes ║
║ somefile (2)(1).docx ║ somefile (2) ║ 1 ║ .docx ║ Yes ║
║ somefile (4).htm.tmpl ║ somefile ║ 4 ║ .htm.tmpl ║ Yes ║
╚═══════════════════════╩══════════════╩═════╩═══════════╩═══════════════════════════════════╝
As you can see, all cases are working excepted when a file name has no extension.
How to fix my regex to solve the failling cases ?
Reproduction : https://regex101.com/r/q9uQii/1
If it matterns, here the relevant C# code :
private static readonly Regex g_fileNameAnalyser = new Regex(
@"(?<base>.+?)(\((?<idx>\d+)\)?)?(?<ext>(\.[\w\.]+))",
RegexOptions.Compiled | RegexOptions.ExplicitCapture
);
...
var candidateMatch = g_fileNameAnalyser.Match(somefilename);
var candidateInfo = new
{
baseName = candidateMatch.Groups["base"].Value.Trim(),
idx = candidateMatch.Groups["idx"].Success ? int.Parse(candidateMatch.Groups["idx"].Value) : 0,
ext = candidateMatch.Groups["ext"].Value
};
Upvotes: 3
Views: 329
Reputation: 163477
What you might do is repeat the ()
part that contains digits asserting there is a next pair. Then capture that next part with the digits as the idx
group.
Make the idx group and the ext group optional using a question mark.
^(?<base>[^\r\n.()]+(?:(?:\(\d+\))*(?=\(\d+\)))?)(?:\((?<idx>\d+)\))?(?<ext>(?:\.[\w\.]+))?$
^
Start of string(?<base>
Start base
group
[^\r\n.()]+
Match 1+ times any char except the listed(?:
Non capturing group
(?:\(\d+\))*(?=\(\d+\))
Repeat matching (digits)
until there is 1 (digits)
part left at the right)?
Close group and make it optional)
End base
group(?:\((?<idx>\d+)\))?
Optional part to match idx
group between (
and )
(?<ext>(?:\.[\w\.]+))?
Optional ext
group$
End of stringUpvotes: 1
Reputation: 627220
You may use
^(?<base>.+?)\s*(?:\((?<idx>\d+)\))?(?<ext>\.[\w.]+)?$
See the regex demo, results:
Pattern details
^
- start of string(?<base>.+?)
- Group "base": any 1 or more chars other than newline, as fewa s possible\s*
- 0+ whitespaces(?:\((?<idx>\d+)\))?
- an optional sequence of:
\(
- a (
char(?<idx>\d+)
- Group "idx": 1+ digits\)
- a )
char(?<ext>\.[\w.]+)?
- - an optional Group "ext":
\.
- a .
char[\w.]+
- 1+ letters, digits, _
or .
chars$
- end of string.Upvotes: 1