akhil
akhil

Reputation: 1212

How can I use 'Regex' to extract a file name?

I want to extract the file name 13572_BranchInformationReport_2012-06-28.zip from the following text -

1:30","/icons/def13572_BranchInformationReport_2012-06-28.zip","13572_BranchInformationReport_2012-06-28.zip",0,"184296","Jun 28

The regular expression code I am using is:

var fileNames = from Match m in Regex.Matches(pageSource, @"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)")
                select m.Value;

Which should work fine.

What am I missing?

Upvotes: 1

Views: 575

Answers (3)

RB.
RB.

Reputation: 37192

You could try the following regex:

\d{5}_\w*_\d{4}-\d{2}-\d{2}\.(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)

This will match anything that:

  1. Starts with five digits
  2. Then an underscore
  3. Then any number of letters or digits
  4. Then an underscore
  5. Then the date part: four digits, dash, two digits, dash, and then two final digits.
  6. Then a period
  7. And finally the extension.

PowerShell example:

$text = '1:30","/icons/def13572_BranchInformationReport_2012-06-28.zip","13572_BranchInformationReport_2012-06-28.zip",0,"184296","Jun 28'

$regex = '\d{5}_\w*_\d{4}-\d{2}-\d{2}\.(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)'

$text -match $regex

$matches[0]

Upvotes: 1

jonjbar
jonjbar

Reputation: 4066

Try the following regular expression:

[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)(?=",")

Upvotes: 1

Nanhydrin
Nanhydrin

Reputation: 4472

You'll need to escape the . in the middle of the regex because . matches for any character.

@"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+\.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)"

Upvotes: 2

Related Questions