Reputation: 330922
I have a stringstream where it has many strings inside like this:
<A style="FONT-WEIGHT: bold" id=thread_title_559960 href="http://microsoft.com/forum/f80/topicName-1234/">Beautiful Topic Name</A> </DIV>
I am trying to get appropriate links that starts with:
style="FONT-WEIGHT: bold
So in the end I will have the link:
http://microsoft.com/forum/f80/topicName-1234/
Topic Id:
1234
Topic Display Name:
Beautiful Topic Name
I am using this pattern, right now, but it doesn't do it all:
"href=\"(?<url>.*?)\">(?<title>.*?)</A>"
Because there are other links that start with href.
Also to use Regex, I added all lines in a single line of string. Does regex care about new lines? IE can it continue to match for strings that span multiple lines?
Please help me with the pattern.
Upvotes: 2
Views: 338
Reputation: 162781
In regular expression the dot wildcard does not match newlines. If you want to match any character including newlines, use [^\x00]
instead of .
. This matches everything except the null character, which means it matches everything.
Try this:
<A\s+style="FONT-WEIGHT: bold"\s+id=(\S+)\s+href="([^"]*)">([^\x00]*?)</A>
If you're trying to assign this to a string using double quotes, you'll need to escape the quotes and backslashes. It'll look something like this:
myVar = "<A\\s+style=\"FONT-WEIGHT: bold\"\\s+id=(\\S+)\\s+href=\"([^\"]*)\">([^\\x00]*?)</A>";
Upvotes: 4
Reputation: 96477
You can make the .
in a pattern match newlines by using the RegexOptions.Singleline enumeration:
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).
So if your title spanned multiple lines, with the option enabled the (?<title>.*?)
part of the pattern would continue across lines attempting to find a match.
Upvotes: 2