Reputation: 14148
Could I get very easy to follow code examples on the following:
thanks
Upvotes: 2
Views: 1407
Reputation: 27803
If you want a pure C# way to traverse web pages, a good place to look is WatiN. It allows you to easily open a web browser and go through the web page (and actions) via C# code.
Here's an example for searching google with the API (taken from their docs)
using System;
using WatiN.Core;
namespaceWatiNGettingStarted
{
class WatiNConsoleExample
{
[STAThread]
static void Main(string[] args)
{
// Open a new Internet Explorer window and
// goto the google website.
IE ie = new IE("http://www.google.com");
// Find the search text field and type Watin in it.
ie.TextField(Find.ByName("q")).TypeText("WatiN");
// Click the Google search button.
ie.Button(Find.ByValue("Google Search")).Click();
// Uncomment the following line if you want to close
// Internet Explorer and the console window immediately.
//ie.Close();
}
}
}
Upvotes: 1
Reputation: 1038790
You may take a look at Html Agility Pack and/or SgmlReader. Here's an example using SgmlReader
which selects all the nodes in the DOM containing some text:
class Program
{
static void Main()
{
using (var reader = new SgmlReader())
{
reader.Href = "http://www.microsoft.com";
var doc = new XmlDocument();
doc.Load(reader);
var nodes = doc.SelectNodes("//*[contains(text(), 'Products')]");
foreach (XmlNode node in nodes)
{
Console.WriteLine(node.OuterXml);
}
}
}
}
Upvotes: 1
Reputation: 4929
Here you can find a tutorial from 4 parts to what you want.
this is the first one , the 4 parts are here (How to Write a Search Engine)
Upvotes: 1
Reputation: 54377
Here is code that uses a WebRequest object to retrieve data and captures the response as a stream.
public static Stream GetExternalData( string url, string postData, int timeout )
{
ServicePointManager.ServerCertificateValidationCallback += delegate( object sender,
X509Certificate certificate,
X509Chain chain,
SslPolicyErrors sslPolicyErrors )
{
// if we trust the callee implicitly, return true...otherwise, perform validation logic
return [bool];
};
WebRequest request = null;
HttpWebResponse response = null;
try
{
request = WebRequest.Create( url );
request.Timeout = timeout; // force a quick timeout
if( postData != null )
{
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.Length;
using( StreamWriter requestStream = new StreamWriter( request.GetRequestStream(), System.Text.Encoding.ASCII ) )
{
requestStream.Write( postData );
requestStream.Close();
}
}
response = (HttpWebResponse)request.GetResponse();
}
catch( WebException ex )
{
Log.LogException( ex );
}
finally
{
request = null;
}
if( response == null || response.StatusCode != HttpStatusCode.OK )
{
if( response != null )
{
response.Close();
response = null;
}
return null;
}
return response.GetResponseStream();
}
For managing the response, I have a custom Xhtml parser that I use, but it is thousands of lines of code. There are several publicly available parsers (see Darin's comment).
EDIT: per the OP's question, headers can be added to the request to emulate a user agent. For example:
request = (HttpWebRequest)WebRequest.Create( url );
request.Accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*";
request.Timeout = timeout;
request.Headers.Add( "Cookie", cookies );
//
// manifest as a standard user agent
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US)";
Upvotes: 2
Reputation: 33870
You could also use selenium to easily traverse the DOM and grab the values of the fields. It will also automatically open the browser for you.
Upvotes: 0