Reputation: 5078
I'm looking at a web file with the following header. How could I get the content of google.com page with bs4?
<head>
<meta http-equiv="refresh" content="5;url=http://google.com"/>
</head>
Thanks!
Upvotes: 1
Views: 5209
Reputation: 133939
Use find
with tag name meta
, and attrs
having the known fixed attribute, namely http-equiv
needs to have value of refresh
. Get the first such element from the result set, and take the value of its 'content'
attribute, then parse it for url.
Thus you get:
>>> fragment = """<head><meta http-equiv="refresh" content="5;url=http://google.com"/></head>"""
>>> soup = BeautifulSoup(fragment)
>>> element = soup.find('meta', attrs={'http-equiv': 'refresh'})
>>> element
<meta content="5;url=http://google.com" http-equiv="refresh"/>
>>> refresh_content = element['content']
>>> refresh_content
u'5;url=http://google.com'
>>> url = refresh_content.partition('=')[2]
>>> url
u'http://google.com'
Upvotes: 4