Joan Venge
Joan Venge

Reputation: 331102

Cleanest way to parse this pattern of strings?

I have music file names like:

Gorillaz (2001)
Gorillaz (7th State Mix) (2002)
Gorillaz (2001) (Featuring Travis)
Gorillaz (1Mix) (2003)
Gorillaz (1000) (2001)

How do I parse the year in the cleanest, easiest way?

Right now I am parsing them by finding each '(' and then making sure the character count between the ()s are 4 and first char is 1 or 2 and they can be parsed, using TryParse?

Can I parse these kinds of strings using a single Regex?


Edit:

The year can be max 50-60 years old, so not older than 1950.

Upvotes: 0

Views: 469

Answers (5)

Joel Coehoorn
Joel Coehoorn

Reputation: 415881

Looks tricky unless we know more about what some of those parenthese are for: if you could have a "(1000)" that's not really a year, you could probably have a "(2000)" that's not really a year also. I'm talking about the last line in your sample:

Gorillaz (1000) (2001)

If that's valid, why not something like this? :

Gorillaz (2000) (2001)

Where the (2000) in the latter example fills the same conceptual role as the (1000) from the former (it's not a year). How will your regex know which is the year? If you know this won't happen, how do you now it won't happen?

Upvotes: 0

naumcho
naumcho

Reputation: 19891

Considering you have things that look like years but aren't e.g. (1000) I would look for 19**, 20**, and maybe 21** if you think your program is going to be around for a while :)

/\(19\d\d|20\d\d|21\d\d\)/

For your inputs this gives:

2001
2002
2001
2003
2001
2001

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1500873

I think this does what you're after:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string[] samples = new[] { "Gorillaz (2001)",
                "Gorillaz (7th State Mix) (2002)",
                "Gorillaz (2001) (Featuring Travis)",
                "Two matches: (2002) (1950)",
                "Gorillaz (1Mix) (1952)",
                "Gorillaz (1Mix) (2003)",
                "Gorillaz (1000) (2001)" };

        foreach (string name in samples)
        {
            ShowMatches(name);
        }
    }

    static readonly Regex YearRegex = new Regex(@"\((19[5-9]\d|200\d)\)");

    static void ShowMatches(string name)
    {
        Console.WriteLine("Matches for: {0}", name);
        foreach (Match match in YearRegex.Matches(name))
        {
            Console.WriteLine(match.Value);
        }
    }
}

That will work as far as 2009. To make it work beyond that, use @"((19[5-9]\d|20[01]\d))" etc.

Note that that still prints out the brackets - you could get rid of them with a group construct, but personally I'd just use Substring :)

Upvotes: 8

Atmocreations
Atmocreations

Reputation: 10061

you should be able to match this using regex. Here is a pattern you might try to use:

\([12][0-9]{3}\)

Don't forget to enable greedy. This will match the (1000) on the last line, as well. Is this wanted, too?

Edit:

 \((19|20)[0-9]{2}\)

will do the job if you don't want the (1000) as a match

regards

Upvotes: 1

Ben Lings
Ben Lings

Reputation: 29403

This regex will match your pattern:

@"\(([12]\d{3})\)"

You can then extract Group 1 to get the year. You can then use Convert.ToInt32 to get the year as an int, and check it is greater than 1950 (it's probably better to do this as a numeric comparison rather than overcomplicating the regex).

Upvotes: 2

Related Questions