Louis Thibault
Louis Thibault

Reputation: 21410

Why does xml retrieved from a site not look like web browser content?

I've been trying to fetch the xml data found here: http://www.thetvdb.com/api/D1BD82E2AE599ADD/mirrors.xml

You'll notice that the xml data is easily read in your web browser. When I try to load it using urllib2, however, the following problem occurs. (Based on the tutorial found at http://www.doughellmann.com/PyMOTW/urllib2/):

import urllib2
response = urllib2.urlopen('http://www.thetvdb.com/api/D1BD82E2AE599ADD/mirrors.xml')

print response.read()

Output:

'<?xml version="1.0" encoding="UTF-8" ?>\n<Mirrors>\n  <Mirror>\n    <id>1</id>\n    <mirrorpath>http://thetvdb.com</mirrorpath>\n    <typemask>7</typemask>\n  </Mirror>\n</Mirrors>\n'

I have tried with other websites (e.g.: python.org) and it seems to work. The problem seems to be library independent (I've had the same problem with urllib, httplib, httplib2, ...) and the problem seems to be specific to the site I'm trying to fetch.

What gives?

EDIT: okay, it seems as though I was confused as to what I "should" be seeing. Out of curiosity, does anybody know what the "script" section is? I'm viewing the page using google chrome (stable).

Upvotes: 2

Views: 219

Answers (2)

user177800
user177800

Reputation:

"It looks nothing like the data that is shown if the page is loaded in a web browser. I'm updating the question with this information.."

When I get that example URL with Chrome I get exactly what you are getting with your Python code, the raw data.

Your browser is auto-magically detecting the XML and formatting it as HTML. It is the the "exact same" as what Python is getting, which is the raw data. The browser is confusing you to what you should be expecting.

NOTE: don't trust what you see or is reported with the Developer Tools information, it shows you the HTML which is in this case a generated wrapper around the output that Chrome is magically generating to enable the interactive display of the XML with code folding ( JavaScript ) and all that other bling, and not what the server is actually sending you, which is what you should see when you use View Source.

Upvotes: 6

Charles Duffy
Charles Duffy

Reputation: 295354

In some cases, a stylesheet is provided by a site telling it how to transform the raw XML into (X)HTML, so the rendering and the literal content can be very different. However -- I don't see that here; what I get (in either Chrome or Firefox) for the URL you gave looks exactly like what your script is giving you, so I don't grok where you're getting a difference.

Upvotes: 1

Related Questions