Reputation: 2897
When I visited this website throught my Firefox 13, I got a page with some content. But when I use wget to download it :
wget http://tinhvan.com
I got other content on downloaded HTML page. Tried set user-agent :
wget -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1' http://tinhvan.com
but got same result.
What happened ? and how do I get the same result as when I visit it throught Firefox?
UPDATE
Here is from Firefox => view source:
<!DOCTYPE html>
<html dir="ltr" lang="vi">
<head id="ctl00_page_header">
<title>
Tinhvan Group - Trang chủ
and here from downloaded by wget
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link href="Content/images/main.css" rel="stylesheet" type="text/css" /><link href="Content/images/mail-detail.css" rel="stylesheet" type="text/css" />
<script src="../../Content/JqueryUI/js/jquery-1.3.2.min.js" type="text/javascript"></script>
<title>
Trang chủ - Tinhvan Group Website
Upvotes: 0
Views: 857
Reputation: 3094
Firefox (not just FF, Chrome, IE, etc does it as well) automatically add Accept* headers.
e.g.
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US, en;q=0.5
try
wget --header="Accept: text/html" -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1' http://tinhvan.com
Note: if you don't declare Accept header then wget automatically adds Accept:*/* which means give me anything you have. It seems that the site returns aplication/xhtml+xml by default but you expect text/html.
Upvotes: 1