HVNSweeting
HVNSweeting

Reputation: 2897

wget doesn't return proper page

When I visited this website throught my Firefox 13, I got a page with some content. But when I use wget to download it :

wget http://tinhvan.com

I got other content on downloaded HTML page. Tried set user-agent :

wget -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1' http://tinhvan.com

but got same result.

What happened ? and how do I get the same result as when I visit it throught Firefox?

UPDATE

Here is from Firefox => view source:

<!DOCTYPE html>

<html dir="ltr" lang="vi">  

    <head id="ctl00_page_header">




            <title>

                Tinhvan Group - Trang chủ       

and here from downloaded by wget

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link href="Content/images/main.css" rel="stylesheet" type="text/css" /><link href="Content/images/mail-detail.css" rel="stylesheet" type="text/css" />
    <script src="../../Content/JqueryUI/js/jquery-1.3.2.min.js" type="text/javascript"></script>    
    <title>

    Trang chủ - Tinhvan Group Website

Upvotes: 0

Views: 857

Answers (1)

Filip
Filip

Reputation: 3094

Firefox (not just FF, Chrome, IE, etc does it as well) automatically add Accept* headers.

e.g.

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US, en;q=0.5

try

wget --header="Accept: text/html"  -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1' http://tinhvan.com

Note: if you don't declare Accept header then wget automatically adds Accept:*/* which means give me anything you have. It seems that the site returns aplication/xhtml+xml by default but you expect text/html.

Upvotes: 1

Related Questions