Reputation: 861

Search Web Content with C#

How do you search a websites source code with C#? hard to explain, heres the source for doing it in python

import urllib2, re
word = "How to ask"
source = urllib2.urlopen("http://stackoverflow.com").read()
if re.search(word,source):
     print "Found it "+word

Upvotes: 4

Answers (3)

Wolfwyrd

Reputation: 15916

If you want to access the raw HTML from a web page you need to do the following:

Use a HttpWebRequest to connect to the file
Open the connection and read the response stream into a string
Search the response for your content

So code something like:

string pageContent = null;
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create("http://example.com/page.html");
HttpWebResponse myres = (HttpWebResponse)myReq.GetResponse();

using (StreamReader sr = new StreamReader(myres.GetResponseStream()))
{
    pageContent = sr.ReadToEnd();
}

if (pageContent.Contains("YourSearchWord"))
{
    //Found It
}

Upvotes: 8

JohannesH

Reputation: 6450

I guess this is as close as you'll get in C# to your python code.

using System;
using System.Net;

class Program
{
    static void Main()
    {
        string word = "How to ask";
        string source = (new WebClient()).DownloadString("http://stackoverflow.com/");
        if(source.Contains(word))
            Console.WriteLine("Found it " + word);
    }
}

I'm not sure if re.search(#, #) is case sensitive or not. If it's not you could use...

if(source.IndexOf(word, StringComparison.InvariantCultureIgnoreCase) > -1)

instead.

Upvotes: 2

Canavar

Reputation: 48088

Here is the source for getting HTML code of a page, you can add your search method later :

string url = "http://someurl.com/default.aspx";
WebRequest webRequest=WebRequest.Create(url);
WebResponse response=webRequest.GetResponse();

Stream str=response.GetResponseStream();
StreamReader reader=new StreamReader(str);
string source=reader.ReadToEnd();

Hope this helps.

Upvotes: 0

Search Web Content with C#

Answers (3)

Related Questions