Reputation: 21410
I've been trying to fetch the xml data found here: http://www.thetvdb.com/api/D1BD82E2AE599ADD/mirrors.xml
You'll notice that the xml data is easily read in your web browser. When I try to load it using urllib2, however, the following problem occurs. (Based on the tutorial found at http://www.doughellmann.com/PyMOTW/urllib2/):
import urllib2
response = urllib2.urlopen('http://www.thetvdb.com/api/D1BD82E2AE599ADD/mirrors.xml')
print response.read()
Output:
'<?xml version="1.0" encoding="UTF-8" ?>\n<Mirrors>\n <Mirror>\n <id>1</id>\n <mirrorpath>http://thetvdb.com</mirrorpath>\n <typemask>7</typemask>\n </Mirror>\n</Mirrors>\n'
I have tried with other websites (e.g.: python.org) and it seems to work. The problem seems to be library independent (I've had the same problem with urllib, httplib, httplib2, ...) and the problem seems to be specific to the site I'm trying to fetch.
What gives?
EDIT: okay, it seems as though I was confused as to what I "should" be seeing. Out of curiosity, does anybody know what the "script" section is? I'm viewing the page using google chrome (stable).
Upvotes: 2
Views: 219
Reputation:
"It looks nothing like the data that is shown if the page is loaded in a web browser. I'm updating the question with this information.."
When I get that example URL with Chrome I get exactly what you are getting with your Python code, the raw data.
Your browser is auto-magically detecting the XML and formatting it as HTML. It is the the "exact same" as what Python is getting, which is the raw data. The browser is confusing you to what you should be expecting.
NOTE: don't trust what you see or is reported with the Developer Tools information, it shows you the HTML which is in this case a generated wrapper around the output that Chrome is magically generating to enable the interactive display of the XML with code folding ( JavaScript ) and all that other bling, and not what the server is actually sending you, which is what you should see when you use View Source.
Upvotes: 6
Reputation: 295354
In some cases, a stylesheet is provided by a site telling it how to transform the raw XML into (X)HTML, so the rendering and the literal content can be very different. However -- I don't see that here; what I get (in either Chrome or Firefox) for the URL you gave looks exactly like what your script is giving you, so I don't grok where you're getting a difference.
Upvotes: 1