Reputation: 1212
I want to extract the file name 13572_BranchInformationReport_2012-06-28.zip
from the following text -
1:30","/icons/def13572_BranchInformationReport_2012-06-28.zip","13572_BranchInformationReport_2012-06-28.zip",0,"184296","Jun 28
The regular expression code I am using is:
var fileNames = from Match m in Regex.Matches(pageSource, @"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)")
select m.Value;
Which should work fine.
What am I missing?
Upvotes: 1
Views: 575
Reputation: 37192
You could try the following regex:
\d{5}_\w*_\d{4}-\d{2}-\d{2}\.(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)
This will match anything that:
PowerShell example:
$text = '1:30","/icons/def13572_BranchInformationReport_2012-06-28.zip","13572_BranchInformationReport_2012-06-28.zip",0,"184296","Jun 28'
$regex = '\d{5}_\w*_\d{4}-\d{2}-\d{2}\.(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)'
$text -match $regex
$matches[0]
Upvotes: 1
Reputation: 4066
Try the following regular expression:
[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)(?=",")
Upvotes: 1
Reputation: 4472
You'll need to escape the . in the middle of the regex because . matches for any character.
@"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+\.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)"
Upvotes: 2