user3388884
user3388884

Reputation: 5078

How to use beautifulsoup to get redirect html?

I'm looking at a web file with the following header. How could I get the content of google.com page with bs4?

<head>
<meta http-equiv="refresh" content="5;url=http://google.com"/>  
</head>

Thanks!

Upvotes: 1

Views: 5209

Answers (1)

Use find with tag name meta, and attrs having the known fixed attribute, namely http-equiv needs to have value of refresh. Get the first such element from the result set, and take the value of its 'content' attribute, then parse it for url.

Thus you get:

>>> fragment = """<head><meta http-equiv="refresh" content="5;url=http://google.com"/></head>"""
>>> soup = BeautifulSoup(fragment)
>>> element = soup.find('meta', attrs={'http-equiv': 'refresh'})
>>> element
<meta content="5;url=http://google.com" http-equiv="refresh"/>

>>> refresh_content = element['content']
>>> refresh_content
u'5;url=http://google.com'

>>> url = refresh_content.partition('=')[2]
>>> url
u'http://google.com'

Upvotes: 4

Related Questions