Stev0
Stev0

Reputation: 605

C# RegEx on a StreamReader will not return matches

I'm writing myself a simple screen scraping application to play around with the HTMLAgilityPack library, and after getting it to work on several different types of HtmlNodes, I figured I'd get fancy and throw in a Regex for Email addresses as well. The only problem is that the application never finds any matches, or maybe it is but not returning properly. This takes place even on sites known to contain email addresses. Can anyone spot what I'm doing wrong here?

      string url = String.Format("http://{0}", mainForm.Target);
      string reg = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b";
      try
            {
                WebClient wClient = new WebClient();
                Stream data = wClient.OpenRead(url);
                StreamReader read = new StreamReader(data);
                MatchCollection matches = Regex.Matches(read.ReadToEnd(), reg, RegexOptions.IgnoreCase|RegexOptions.Multiline);
                foreach (Match match in matches)
                {
                    textBox1.AppendText(match.ToString() + Environment.NewLine);
                }

Upvotes: 0

Views: 1256

Answers (2)

empi
empi

Reputation: 15881

Check the string that is returned by read.ReadToEnd() and see if you can find email addresses in this string with your regex. I guess that your problem doesn't have anything to do with StreamReader.

Upvotes: 0

Matthew Flaschen
Matthew Flaschen

Reputation: 284796

Use raw strings:

string reg = @"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b";

Without that, \b becomes backspace. Also, your last period should be \., so it only matches a literal period.

Upvotes: 2

Related Questions