Reputation: 753
i have <A HREF="f110111.ZIP">
and f110111
- is an arbitrary char sequence.
I need C# regex match expression to extract all above.
E. g. input is
<A HREF="f110111.ZIP"><A HREF="qqq.ZIP"><A HREF="gygu.ZIP">
I want the list:
Upvotes: 1
Views: 764
Reputation: 3185
I think Regular Expressions are a great way to filter text out of a given text.
This regex gets the File, Filename and Extension from the given text.
href="(?<File>(?<Filename>.*?)(?<Ext>\.\w{1,3}))"
Regex above expects an extension that exists out of word characters a-z A-Z 0-9, between 1 and 3 characters.
C# Code sample:
string regex = "href=\"(?<File>(?<Filename>.*?)(?<Ext>\\.\\w{1,3}))\"";
RegexOptions options = ((RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline) | RegexOptions.IgnoreCase);
Regex reg = new Regex(regex, options);
Upvotes: 0
Reputation: 20333
If you can have multiple dots in the filename:
<A HREF="(^["]+?).zip
If you do not have dots in the filename (just one before the zip
), you can use a faster one:
<A HREF="(^[".]+)
C# example:
Pattern pattern = Pattern.compile("<A HREF=\"(^[\"]+?).zip");
Matcher matcher = pattern.matcher(buffer);
while (matcher.find()) {
// do something with: matcher.group(1)
}
Upvotes: 2
Reputation: 3385
What you need is the htmlagility pack/! That will allow you to read HTML in an easy manner and provide an easy way to retrieve links.
Upvotes: 3
Reputation: 19862
NO NO! Do not use Regex to parse HTML!
Try an XML Parser. Or XPath perhaps.
Upvotes: 0