akhil
akhil

Reputation: 1212

What could be the Regex to search strings from a file?

I have an HTML file containing these lines -

<script>PrintFileURL("13572_BranchInformationReport_2012-06-29.xml","13572_BranchInformationReport_2012-06-29.zip",0,"184277","Jun 29  1:30","/icons/default.gif")</script>
<script>PrintFileURL("13572_BranchInformationReport_2012-07-02.zip","13572_BranchInformationReport_2012-07-02.zip",0,"184302","Jul  2  1:30","/icons/default.gif")</script>
<script>PrintFileURL("13572_IndividualInformationReportDelta_2012-06-29_033352.zip","13572_IndividualInformationReportDelta_2012-06-29_033352.zip",0,"53147","Jun 29  3:33","/icons/default.gif")</script>
<script>PrintFileURL("13572_IndividualInformationReportDelta_2012-07-02_033458.zip","13572_IndividualInformationReportDelta_2012-07-02_033458.zip",0,"62719","Jul  2  3:35","/icons/default.gif")</script>
<script>PrintFileURL("13572_IndividualInformationReport_2012-07-01.acc","13572_IndividualInformationReport_2012-07-01.zip",0,"4033364","Jul  1 12:50","/icons/default.gif")</script>

I need to extract the file names from this string -

13572_IndividualInformationReportDelta_2012-06-29_033352.zip

13572_IndividualInformationReportDelta_2012-07-02_033458.zip

13572_BranchInformationReport_2012-07-02.zip

13572_BranchInformationReport_2012-07-02.xml

13572_IndividualInformationReport_2012-07-01.acc

Right now I m using following Regex code -

 var fileNames = from Match m in Regex.Matches(pageSource, @"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+\.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)")
                        select m.Value;

It is giving me the last 3 files but not the first 2 files.

Can someone provide me one Regex to extract all these files?

Thanks in advance :)

Upvotes: 0

Views: 87

Answers (3)

Marek Musielak
Marek Musielak

Reputation: 27132

Try the regex below

@"^[^\(]*\(\""([^""]+)\"""

and use:

match.Groups[1];

Upvotes: 0

burning_LEGION
burning_LEGION

Reputation: 13450

\d+_\w+_\d+-\d+-\d+(_\d+)?\.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)

Upvotes: 0

penartur
penartur

Reputation: 9912

Add (_+[0-9]+)? to it:

var fileNames = from Match m in Regex.Matches(pageSource, @"[0-9]+_+[A-Za-z]+_+[0-9]+-+[0-9]+-+[0-9]+(_+[0-9]+)?\.+(acc|zip|app|xml|def|enr|exm|fpr|pnd|trm)")

That means that it should also match lines with the optional _+[0-9]+ postfix in filename.

Upvotes: 1

Related Questions