Reputation: 548
I am trying to match:
<h4>Manufacturer</h4>\n\n Gigabyte\n\n\n
My Regex ATM is:
Match regex = Regex.Match(cleanedUpHtml, "Manufacturer(.*?)\n\n\n", RegexOptions.IgnoreCase);
However it does not work.
The (.*?) should match all in between.
Upvotes: 2
Views: 1516
Reputation: 1742
Generally I prefere to cleanup the string from html tags and new-line characters before using the regex.
(.*?)
stops capture with \n characer, you might use a more generic group instead, like ([\w|\W]*?)
Upvotes: 0
Reputation: 626728
Here are 2 things I find important:
Whenever you declare a regex pattern in C#, it is advisable to use string literals, i.e. @"PATTERN"
. This simplifies writing regex patterns.
RegexOptions.Singleline
must be used to treat multiline text as a string, i.e. a dot will match a line break.
Here is my code snippet:
var str = "<h4>Manufacturer</h4>\n\n Gigabyte\n\n\n";
var regex = Regex.Match(str, @"Manufacturer(.*?)\n\n\n",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (regex.Success)
MessageBox.Show("\"" + regex.Value + "\"");
The regex.Value
is
"Manufacturer</h4>
Gigabyte
"
Best regards.
Upvotes: 2
Reputation: 548
I replaced \n with another value and then Regex searched my replaced value. It is working for the time being, but it may not be the best approach. Any recommendations appreciated.
cleanedUpHtml = cleanedUpHtml.Replace("\n", "p19o9");
Match regex = Regex.Match(cleanedUpHtml, "Manufacturer(.*?)p19o9p19o9p19o9", RegexOptions.IgnoreCase);
Upvotes: 1