Reputation: 1111
I have a situation where I need to extract dates from the file names whose general pattern is [filename_]YYYYMMDD[.fileExtension]
e.g. "xxx_20100326.xls" or x2v_20100326.csv
The below program does the work
//Number of charecter in the substring is set to 8
//since the length of YYYYMMDD is 8
public static string ExtractDatesFromFileNames(string fileName)
{
return fileName.Substring(fileName.IndexOf("_") + 1, 8);
}
Is there any better option of achieving the same?
I am basically looking for standard practice.
I am using C#3.0 and dotnet framework 3.5
Edit:
I have like the solution and the way of answerig of LC. I have used his program like
string regExPattern = "^(?:.*_)?([0-9]{4})([0-9]{2})([0-9]{2})(?:\\..*)?$";
string result = Regex.Match(fileName, @regExPattern).Groups[1].Value;
The input to the function is : "x2v_20100326.csv"
But the output is: 2010 instead of 20100326
(which is the expected one).
Can anyone please help.
Upvotes: 4
Views: 8635
Reputation: 116458
I would use a regex, especially if it's possible there's more than one underscore in the filename. Then you can capture the year, month, day and return a DateTime
if necessary. This way you can make sure you're extracting the right portion of the filename and it indeed matches the pattern you're searching for.
For the pattern [filename_]YYYYMMDD[.fileExtension]
, I'm thinking something like:
^(?:.*_)?([0-9]{4})([0-9]{2})([0-9]{2})(?:\..*)?$
Then your captured groups will be year, month, and day, in that order.
Explanation:
^
: The beginning of your string.
(?:.*_)?
: An optional non-capturing group, containing any number of characters followed by an underscore.
([0-9]{4})
: A capturing group containing exactly four digits.
([0-9]{2})
: A capturing group containing exactly two digits.
(?:\..*)?
: An optional non-capturing group, containing a dot followed by any number of characters.
$
: The end of your string.
However, I will add that if you're sure your filenames have one and only one underscore, and the date follows that underscore, the code you have is cleaner and will probably be slightly faster than the regex. It's something to keep in mind based on the expected input set.
Upvotes: 3
Reputation: 96477
The code you have is sufficient as long as you are sure of the input being that standard format. If there's a chance it won't be then you should add some error handling for scenarios where there is no underscore, or the days/months aren't represented by 2 digits (which will mess up the 8 character substring count), followed by a DateTime.TryParse
to ensure it's a real date.
Your other options are:
SkipWhile
, Skip
, TakeWhile
methods to ignore the underscore and capture the numbers until the period is encountered. This query ends up looking confusing and the result needs to be transformed to a string.{ '_', '.' }
and use the array element representing the date.None of these options are going to yield code that looks clearer than what you already have and performance probably won't be any better.
Upvotes: 2
Reputation: 61971
The code you've got is fine, except that you may want to check the return value of IndexOf
in case you encounter a file with no _, ie.
int index = fileName.IndexOf("_");
if (index != -1)
return fileName.Substring(index + 1, 8);
else
...
If you want to check whether it's a valid date you can then call DateTime.TryParseExact
Upvotes: 0