user254197
user254197

Reputation: 881

Passing language preference to HtmlAgilityPack when retrieving web pages

my aim is to read out specific container/tags/attributes from a web site(for my hobby), everything works except getting the german translation for my value (which I usually get when manually open the site in my browser(may be the web site gets the specific information from the user agent), but if I use my program it will only give me the english value)

The working c# console code:

   List<string> href = new List<string>();
    List<string> titles = new List<string>();

    for (int i = 0; i < 1; i++)
    {
        var webOverview = new HtmlWeb();
        var documentOverview = webOverview.Load("http://gatherer.wizards.com/Pages/Search/Default.aspx?page=0&format=[%22Commander%22]");
            webOverview.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0";//updated
            webOverview.AutoDetectEncoding = true;//updated
        var pageOverview = documentOverview.DocumentNode;

        HtmlNode[] hrefList = pageOverview.QuerySelectorAll("td.leftCol").ToArray();
        HtmlNode[] titleList = pageOverview.QuerySelectorAll("div.cardInfo").ToArray();

        for (int rowcounter = 0; rowcounter < hrefList.Count(); rowcounter++)
        {
            var hrefValue = hrefList[rowcounter].QuerySelector("a").Attributes["href"].Value;
            var titleValue = titleList[rowcounter].QuerySelector("span.cardTitle").InnerText;

            href.Add(hrefValue);
            titles.Add(titleValue);
            Console.WriteLine(rowcounter.ToString() + ". " + hrefValue + ": " + titleValue + "\n\n");
        }

    }
    Console.WriteLine("Links: " + href.Count + " Titles: " + titles.Count + "\n");

In my browser I see something like this "Schlachthaus-Ghul (Abattoir Ghoul)"(without setting language properties), but if I excecute my program I get "Abattoir Ghoul", which has been produced by the statement

HtmlNode[] titleList = pageOverview.QuerySelectorAll("div.cardInfo").ToArray();

but I need this " "Schlachthaus-Ghul (Abattoir Ghoul)", instead of just the english text

May be I need something like this user agent and I could not find a pramameter in the URL,which tells the server I want also german in the title information?

I updated two lines(commentary) and it does/did not change anything.(user agent from http://www.whatsmyuseragent.com

Upvotes: 2

Views: 1659

Answers (1)

jessehouwing
jessehouwing

Reputation: 114751

You'll need to tell the server that you expect a German page by sending it the "Accept-Language" header:

var webOverview = new HtmlWeb();
webOverview.PreRequest += (request) =>
{
    request.Headers.Add("Accept-Language", "de-DE");
    return true;
};
var documentOverview = webOverview.Load("http://gatherer.wizards.com/Pages/Search/Default.aspx?page=0&format=[%22Commander%22]");

Which is basically a short for:

public void Yourmethod()
{
   var webOverview = new HtmlWeb();
   webOverview.PreRequest += SendGermanLanguageHeaders;

   var documentOverview = webOverview.Load("http://gatherer.wizards.com/Pages/Search/Default.aspx?page=0&format=[%22Commander%22]");
}

private bool SendGermanLanguageHeaders(HttpWebRequest request)
{
   request.Headers.Add("Accept-Language", "de-DE");
   return true;
}

The lambda construct defines an anonymous method, short, inline and in context. That method is then added to the PreRequest event handler list using the += construct you're probably familiar with. More about this construct can be found here.

Upvotes: 7

Related Questions