Newbie
Newbie

Reputation: 1111

Extract dates from filename

I have a situation where I need to extract dates from the file names whose general pattern is [filename_]YYYYMMDD[.fileExtension]

e.g. "xxx_20100326.xls" or x2v_20100326.csv

The below program does the work

//Number of charecter in the substring is set to 8 
//since the length of YYYYMMDD is 8

public static string ExtractDatesFromFileNames(string fileName)
{

    return fileName.Substring(fileName.IndexOf("_") + 1, 8);
}

Is there any better option of achieving the same?

I am basically looking for standard practice.

I am using C#3.0 and dotnet framework 3.5

Edit:

I have like the solution and the way of answerig of LC. I have used his program like

string regExPattern = "^(?:.*_)?([0-9]{4})([0-9]{2})([0-9]{2})(?:\\..*)?$";
string result =  Regex.Match(fileName, @regExPattern).Groups[1].Value;

The input to the function is : "x2v_20100326.csv"

But the output is: 2010 instead of 20100326(which is the expected one).

Can anyone please help.

Upvotes: 4

Views: 8635

Answers (3)

lc.
lc.

Reputation: 116458

I would use a regex, especially if it's possible there's more than one underscore in the filename. Then you can capture the year, month, day and return a DateTime if necessary. This way you can make sure you're extracting the right portion of the filename and it indeed matches the pattern you're searching for.

For the pattern [filename_]YYYYMMDD[.fileExtension], I'm thinking something like:

^(?:.*_)?([0-9]{4})([0-9]{2})([0-9]{2})(?:\..*)?$

Then your captured groups will be year, month, and day, in that order.

Explanation:

^: The beginning of your string.

(?:.*_)?: An optional non-capturing group, containing any number of characters followed by an underscore.

([0-9]{4}): A capturing group containing exactly four digits.

([0-9]{2}): A capturing group containing exactly two digits.

(?:\..*)?: An optional non-capturing group, containing a dot followed by any number of characters.

$: The end of your string.

However, I will add that if you're sure your filenames have one and only one underscore, and the date follows that underscore, the code you have is cleaner and will probably be slightly faster than the regex. It's something to keep in mind based on the expected input set.

Upvotes: 3

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

The code you have is sufficient as long as you are sure of the input being that standard format. If there's a chance it won't be then you should add some error handling for scenarios where there is no underscore, or the days/months aren't represented by 2 digits (which will mess up the 8 character substring count), followed by a DateTime.TryParse to ensure it's a real date.

Your other options are:

  • Regex: overkill for such a well-defined pattern.
  • LINQ: using the SkipWhile, Skip, TakeWhile methods to ignore the underscore and capture the numbers until the period is encountered. This query ends up looking confusing and the result needs to be transformed to a string.
  • String.Split: split on { '_', '.' } and use the array element representing the date.

None of these options are going to yield code that looks clearer than what you already have and performance probably won't be any better.

Upvotes: 2

EMP
EMP

Reputation: 61971

The code you've got is fine, except that you may want to check the return value of IndexOf in case you encounter a file with no _, ie.

 int index = fileName.IndexOf("_");
 if (index != -1)
     return fileName.Substring(index + 1, 8);
 else
     ...

If you want to check whether it's a valid date you can then call DateTime.TryParseExact

Upvotes: 0

Related Questions