Reputation: 605
I'm writing myself a simple screen scraping application to play around with the HTMLAgilityPack library, and after getting it to work on several different types of HtmlNodes, I figured I'd get fancy and throw in a Regex for Email addresses as well. The only problem is that the application never finds any matches, or maybe it is but not returning properly. This takes place even on sites known to contain email addresses. Can anyone spot what I'm doing wrong here?
string url = String.Format("http://{0}", mainForm.Target);
string reg = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b";
try
{
WebClient wClient = new WebClient();
Stream data = wClient.OpenRead(url);
StreamReader read = new StreamReader(data);
MatchCollection matches = Regex.Matches(read.ReadToEnd(), reg, RegexOptions.IgnoreCase|RegexOptions.Multiline);
foreach (Match match in matches)
{
textBox1.AppendText(match.ToString() + Environment.NewLine);
}
Upvotes: 0
Views: 1256
Reputation: 15881
Check the string that is returned by read.ReadToEnd() and see if you can find email addresses in this string with your regex. I guess that your problem doesn't have anything to do with StreamReader.
Upvotes: 0
Reputation: 284796
Use raw strings:
string reg = @"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b";
Without that, \b
becomes backspace. Also, your last period should be \.
, so it only matches a literal period.
Upvotes: 2