Reputation: 7233
I'm having trouble handling a certain redirect with Python. I'm requesting a page that apparently loads and immediately redirects to ww1.www.com. I'm assuming this is the case because I've tried every method I know of returning headers/status codes and always end up with appropriate results (status code: 200, appropriate hosts/referrer params, etc).
Here is what I have:
from BeautifulSoup import BeautifulSoup
import urllib
import psycopg2
import psycopg2.extras
db = psycopg2.connect(
host = 'myIP'
database = 'myDATABASE'
user = 'myUSERNAME'
password = 'myPASSWORD'
)
cursor = db.cursor(cursor_factory = psycopg2.extras.RealDictCursor)
cursor.execute("SELECT info FROM table")
for row in cursor:
url = 'http://www.website.com/' + row['info']
file_pointer = urllib.urlopen(url)
html_object = BeautifulSoup(file_pointer)
if file_pointer.getcode() != 200:
continue
The if statement should prevent any further code from being executed if the status code does not equal 200, however I get Index Errors in code after this section, and after investigating the url that provides the error, I find that it redirects without giving me a status code: 302.
Any thoughts as to why I would be getting a 200 status code response while still redirecting? (I've also tried equivalents with urllib2 and httplib) Also, how can I prevent this from happening?
Upvotes: 1
Views: 2355
Reputation: 142216
one thing that doesn't look right
html_object = BeautifulSoup(file_pointer)
should operate on the data from urlopen
, not the handle:- so - html_object = BeautifulSoup(file_pointer.read())
is what's wanted here...
for debugging
Install requests if you haven't already - it's a great library to use for these kind of things.
Then:
import requests
for row in cursor:
page = requests.get('your url')
for hist in page.history:
print hist.status_code, hist.url
And see if that throws out anything that's puzzling...
Upvotes: 2