Reputation: 5870
I am sure that has been asked before, but I cannot find the appropriate question(s).
Being new to C#'s Regex, I want to mimic what is possible e.g. with sed
and awk
where I would write s/_(20[0-9]{2})[.0-9]{1}/\1/g
in order to find obtain a 4-digit year number after 2000 which is has an underscore as prefix and a number or a dot afterwards. The \1
refers to the value within brackets.
Example: Both files fx_201902.csv
or fx_2019.csv
should give me back myYear=2019
. I was not successful with:
string myYear = Regex.Replace(Path.GetFileName(x), @"_20([0-9]{2})[.0-9]{1}", "\1")
How do I have to escape? Or is this kind of replacement not possible? If so, how would I do that?
Edit: My issue how to do the /1
in C#, in other words how to extract a regex-variable. Please forgive me my typos in the original post - I am trying the new SO app and I submitted earlier than intended.
Upvotes: 0
Views: 235
Reputation: 529
Your second example does not contains the month's digits. If you still want to capture, make it optional:
Regex.Replace(Path.GetFileName(x), @"_20([1-9]{2})([.0-9]{2})?", "\1")
Note that I only added 3 characters to your query: (, ) and ?
If you want the returning value to be as expected: change the replacement to $1 from \1 as documented (with the correct parenthesis) and capture 2020, 2030, etc (still excluding 2000) with the usage of or operator and the combination of [0-9]{1} and [1-9]{1}:
Regex.Replace(Path.GetFileName(x), @"_(20(([1-9]{1})([0-9]{1})||([0-9]{1})([1-9]{1})))([.0-9]{2})?", "$1")
It worths mentioning that $3 and $4 matches the last and the 2nd last digit; and $2 matches with the last 2 digits (aka the combination of [0-9]{1} [1-9]{1} || [1-9]{1} [0-9]{1}).
Upvotes: 0
Reputation: 37367
I'd suggest more robust regex: _(20(?:0[1-9]|[1-9][0-9]))[\d.]
Explanation:
_
- match _
literally
(...)
- first capturing group
20
- match 20
literally
(?:...)
- non-capturing group
0[1-9]|[1-9][0-9]
- alternation: match 0 and digit other than 0 OR match digit other then zero followed by any digits - this allows you to match ANY year after 2000
[\d.]
- match dot or digit
And below is how you use capturing groups:
var regex = new Regex(@"_(20(?:0[1-9]|[1-9][0-9]))[\d.]");
regex.Match("fx_201902.csv").Groups[1].Value;
// "2019"
regex.Match("fx_20190.csv").Groups[1].Value;
// "2019"
regex.Match("fx_2019.csv").Groups[1].Value;
// "2019"
Upvotes: 1
Reputation: 163342
You might use a capturing group for the first 4 digits and match what is before and after the 4 digits.
.*_(20[0-9]{2})[0-9]*\.\w+$
Explanation
.*_
Match the last underscore(20[0-9]{2})
Match 20 and 2 digits[0-9]*\.
Match 0 or more occurrences of a digit followed by a dot\w+$
Match 1 or or more word chars till the end of the string.In the replacement use:
$1
For example
string[] strings = {"fx_2019.csv", "fx_201902.csv"};
foreach (string s in strings)
{
string myYear = Regex.Replace(s, @".*_(20[0-9]{2})[0-9]*\.\w+$", "$1");
Console.WriteLine(myYear);
}
Output
2019
2019
Upvotes: 1
Reputation: 147166
To extract the year using Regex.Replace
, you need to capture only the year part of the string into a group and replace the entire string with just the capture group. That means you need to also match the characters before and after the year using (for example)
^.*_(20[0-9]{2})[.0-9].*$
That can then be replaced with $1
e.g.
Regex r = new Regex(@"^.*_(20[0-9]{2})[.0-9].*$");
string filename = "fx_201902.csv";
string myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);
filename = "fx_2019.csv";
myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);
Output:
2019
2019
If you want to exclude the year 2000 from your match, change the regex to
^.*_(20(?:0[1-9]|[1-9][0-9]))[.0-9].*$
Upvotes: 1