Reputation: 331102
I have music file names like:
Gorillaz (2001)
Gorillaz (7th State Mix) (2002)
Gorillaz (2001) (Featuring Travis)
Gorillaz (1Mix) (2003)
Gorillaz (1000) (2001)
How do I parse the year in the cleanest, easiest way?
Right now I am parsing them by finding each '('
and then making sure the character count between the ()
s are 4 and first char is 1 or 2 and they can be parsed, using TryParse
?
Can I parse these kinds of strings using a single Regex?
The year can be max 50-60 years old, so not older than 1950.
Upvotes: 0
Views: 469
Reputation: 415881
Looks tricky unless we know more about what some of those parenthese are for: if you could have a "(1000)
" that's not really a year, you could probably have a "(2000)
" that's not really a year also. I'm talking about the last line in your sample:
Gorillaz (1000) (2001)
If that's valid, why not something like this? :
Gorillaz (2000) (2001)
Where the (2000)
in the latter example fills the same conceptual role as the (1000)
from the former (it's not a year). How will your regex know which is the year? If you know this won't happen, how do you now it won't happen?
Upvotes: 0
Reputation: 19891
Considering you have things that look like years but aren't e.g. (1000) I would look for 19**, 20**, and maybe 21** if you think your program is going to be around for a while :)
/\(19\d\d|20\d\d|21\d\d\)/
For your inputs this gives:
2001
2002
2001
2003
2001
2001
Upvotes: 1
Reputation: 1500873
I think this does what you're after:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
string[] samples = new[] { "Gorillaz (2001)",
"Gorillaz (7th State Mix) (2002)",
"Gorillaz (2001) (Featuring Travis)",
"Two matches: (2002) (1950)",
"Gorillaz (1Mix) (1952)",
"Gorillaz (1Mix) (2003)",
"Gorillaz (1000) (2001)" };
foreach (string name in samples)
{
ShowMatches(name);
}
}
static readonly Regex YearRegex = new Regex(@"\((19[5-9]\d|200\d)\)");
static void ShowMatches(string name)
{
Console.WriteLine("Matches for: {0}", name);
foreach (Match match in YearRegex.Matches(name))
{
Console.WriteLine(match.Value);
}
}
}
That will work as far as 2009. To make it work beyond that, use @"((19[5-9]\d|20[01]\d))" etc.
Note that that still prints out the brackets - you could get rid of them with a group construct, but personally I'd just use Substring
:)
Upvotes: 8
Reputation: 10061
you should be able to match this using regex. Here is a pattern you might try to use:
\([12][0-9]{3}\)
Don't forget to enable greedy. This will match the (1000) on the last line, as well. Is this wanted, too?
Edit:
\((19|20)[0-9]{2}\)
will do the job if you don't want the (1000) as a match
regards
Upvotes: 1
Reputation: 29403
This regex will match your pattern:
@"\(([12]\d{3})\)"
You can then extract Group 1 to get the year. You can then use Convert.ToInt32
to get the year as an int
, and check it is greater than 1950 (it's probably better to do this as a numeric comparison rather than overcomplicating the regex).
Upvotes: 2