Reputation: 801
I have tried just about all related solutions found on the Web, but they all refused to work for some reason. And this does not work too: C# - HttpWebRequest POST (Login to Facebook) , since we are using different methods.
And I am not using the POST method, but the GET method, which is being used in a request. The site I am using does not need any login credentials to get the image. (Most of the other root domains the site has does not require a cookie.)
The below code is a part of what I figured out to make the program get the image like the web-based versions do, but with a few problems.
Before, I was trying to use a normal WebClient to download the image since it refused to show up in any way that the PictureBox control would accept. But then I switched to HttpWebRequest.
The particular root domain of the site where I am trying to get the image from requires a cookie, though.
Below is a code snippet which basically tries to get an image from a site. The only trouble is, it is almost impossible to get the image from the site unless you pass a few things in the HttpWebRequest, along with a cookie.
For now, I am using a static cookie as a temporary workaround.
HttpWebRequest _request = (HttpWebRequest)HttpWebRequest.Create(_URL);
_request.Method = WebRequestMethods.Http.Get;
_request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
_request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
_request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
_request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
_request.Host = "www.habbo" + _Country;
_request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
using (WebResponse _response = _request.GetResponse())
using (Stream _stream = _response.GetResponseStream())
{
Image _image = Image.FromStream(_stream);
_bitmap = new Bitmap(_image);
string contentType = _response.ContentType;
_PictureBox.Image = _bitmap;
}
Let's let the following variables be:
_URL = "http://www.habbo.com/habbo-imaging/avatarimage?hb=img&user=aa&direction=2&head_direction=2&size=m&img_format=gif";
_Country = ".com";
Most of the things I am passing into the HttpWebRequest is obtained from looking at the Network tab of Google Chrome's Developer Tools.
The web-based versions of the Habbo Imager seems to just direct people to the page where they can find the image, and their browsers seem to somehow add the cookie. What I am doing is different, as all they do is display the site where the image is located, but I want to locate the image's true location, then read from it to a type Image.
Apparently the site seems to need the user to "visit" them, according to what I read from this thread: Click here
What I would like to know is, is there a better way to get a valid cookie that the server will happily accept every time?
Or do I need to somehow trick the site into thinking the user has visited the page and seen it, thereby making them maybe return the cookie we might need, even though the user doesn't ever see the page?
Not too sure if this would mean that I need to somehow dynamically generate the cookies though.
I also do not understand how to truly create or get the cookies (and set stored cookies) using C#, so if it is possible, please use some examples.
I would prefer to not use any third-party libraries, or to change the code I am using too much. Neither is the program going to send two GET requests just to be able to get what it could get with one GET request. Thus, this wouldn't work: Passing cookie with HttpWebRequest in winforms?
I am using .NET 4.0.
Upvotes: 1
Views: 3642
Reputation: 42434
It is a little bit more complicated than at first sight expected. The browser makes actually two calls. The first one returns an html script with a small piece of javascript that when executed sets a cookie and reload the page. In your c# code you have to mimic that.
In your form class add an instance variable to hold all the cookies across multiple httpwebrequest calls:
readonly CookieContainer cookiecontainer = new CookieContainer();
I have created a Builder method that creates the HttpWebRequest and returns an HttpWebResponse. It takes a namevaluecollection to add any cookies to the Cookiecontainer.
private HttpWebResponse Builder(string url, string host, NameValueCollection cookies)
{
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
request.Method = WebRequestMethods.Http.Get;
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
// _request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
request.Host = host;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
request.CookieContainer = cookiecontainer;
if (cookies != null)
{
foreach (var cookiekey in cookies.AllKeys)
{
request.CookieContainer.Add(
new Cookie(
cookiekey,
cookies[cookiekey],
@"/",
host));
}
}
return (HttpWebResponse) request.GetResponse();
}
If the incoming stream turns out to be an text/html contenttype we need to parse its content and return the cookie name and value. The Parse method does just that:
// find in the html and return the three parameters in a string array
// setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '127.0.0.1', 10);
private static string[] Parse(Stream _stream, string encoding)
{
const string setCookieCall = "setCookie('";
// copy html as string
var ms = new MemoryStream();
_stream.CopyTo(ms);
var html = Encoding.GetEncoding(encoding).GetString(ms.ToArray());
// find setCookie call
var findFirst = html.IndexOf(
setCookieCall,
StringComparison.InvariantCultureIgnoreCase) + setCookieCall.Length;
var last = html.IndexOf(");", findFirst, StringComparison.InvariantCulture);
var setCookieStatmentCall = html.Substring(findFirst, last - findFirst);
// take the parameters
var parameters = setCookieStatmentCall.Split(new[] {','});
for (int x = 0; x < parameters.Length; x++)
{
// cleanup
parameters[x] = parameters[x].Replace("'", "").Trim();
}
return parameters;
}
Now are our building blocks complete we can start calling our methods from the Click method. We use a loop to call our Builder twice to obtain a result from the given url. Based on the received contenttype we either Parse or create the Image from the stream.
private void button1_Click(object sender, EventArgs e)
{
var cookies = new NameValueCollection();
for (int tries = 0; tries < 2; tries++)
{
using (var response = Builder(_URL, "www.habbo" + _Country, cookies))
{
using (var stream = response.GetResponseStream())
{
string contentType = response.ContentType.ToLowerInvariant();
if (contentType.StartsWith("text/html"))
{
var parameters = Parse(stream, response.CharacterSet);
cookies.Add(parameters[0], parameters[1]);
}
if (contentType.StartsWith("image"))
{
pictureBox1.Image = Image.FromStream(stream);
break; // we're done, get out
}
}
}
}
}
This code works for the url in your question. I didn't take any measures to handle other patterns, and/or exceptions. It is up to you to add that. Also when doing this kind of scraping make sure the owner of the website does allow this.
Upvotes: 1