Jamie Hartnoll
Jamie Hartnoll

Reputation: 7341

Using Regex Regular Expressions, how do I exclude certain things in links

Following on from a post I made earlier, I am making progress with what I require, but not knowing much about how RegEx expressions work, I'm stuck!

This line:

FilesM = Regex.Matches(StrFile, "<link.*?href=""(.*?)"".*? />")

Is extracting from the HTML of my page, all <link.. elements to compile a combined style file.

However, I need to exclude any media="print" links.

I am also trying to combine JS scripts

FilesM1 = Regex.Matches(StrFile, "<script.*?src=""(.*?)"".*?></script>")

Does this, but in this case, I want to exclude any scripts which are not hosted locally. I'd like to do this by excluding any scripts where the href starts with "http"

So how would I exclude these two cases from the match collection?

Upvotes: 0

Views: 391

Answers (1)

Steven Doggart
Steven Doggart

Reputation: 43743

I know this isn't exactly what you are looking for, but, in case you are interested, here's an example of how to find just the elements you care about using XPath:

Dim doc As New XmlDocument()
doc.LoadXml(html)
Dim linkNodes As XmlNodeList = doc.SelectNodes("descendant-or-self::link[(@href) and (not(@media) or (@media != 'print'))]")
Dim scriptNodes As XmlNodeList = doc.SelectNodes("descendant-or-self::script[(@src) and (not(starts-with(@src,'http')))]")

The XmlDocument.SelectNodes method returns all elements that match the given XPath.

In the XPath string, descendant-or-self:: means you want it to search all elements from the current position (the root) down through all descendants for the following element name. If that was left out, it would only look for matching elements at the current (root) level.

The [] clauses provide conditions. So for instance, link[@media != 'print'] would match all link elements that don't have a media attribute that equals "print". The @ sign specifies an attribute name.

Simply listing an attribute name by itself in a condition means that you are checking for the existence of that attribute. For instance, link[@href] matches all link elements that do have an href attribute.

Upvotes: 1

Related Questions