Reputation: 165
I am using WebClient to scraping google search. all the time I getting "Cannot reach this page" until I changed the User-Agent Header:
string page = string.Format("https://www.google.com/search?q={0}&hl=en", my_stocks[order].Symbole+" stock");
WebClient client = new WebClient ();
client.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
string r = client.DownloadString(page);
but the html presented differently from when I searching the same thing in my chrome. so I tried change the header to the same when I use chrome with https://www.whatismybrowser.com/detect/what-is-my-user-agent but getting "Cannot reach this page" again. What am I missing here?
Upvotes: 1
Views: 943
Reputation: 735
My 2 cents ...
Since the influx of Single-Page-Applications, web scraping isn't what it used to be as pages are generally not server-side rendered any more.
It's highly likely that a Google Search is delivered using asynchronous REST queries, rather than a server-side rendered page.
Watch the Network trace in your Chrome tab when you do a Google search and you'll likely see many different network requests.
I suggest that you look for a more specific API to deal with the type of request that you're looking to make.
Upvotes: 1