Reputation: 890
I trying to get a specific element from the HTML DOM that appears when you inspect element but for some reason, this is looking into the pure HTML code that doesn't have the javascript executed. Any ideas? The only thing I do differently from the others is that line to avoid 403 error.
import urllib2
from bs4 import BeautifulSoup as BS
#avoid 403 error
request = urllib2.Request(url, headers={'User-Agent' : "Mozilla/5.0"})
html = urllib2.urlopen(request).read()
soup = BS(html, 'html.parser')
print soup.find('div', {'class' : 'video'})
Upvotes: 0
Views: 60
Reputation: 176
this is looking into the pure HTML code that doesn't have the javascript executed
The javascript is not parsed by beautifulsoap, you're getting the raw webpage and no script is executed.
The only thing I do differently from the others is that line to avoid 403 error
Urllib2
's default user agent string is "Python-urllib/_python_version_"
, probably the website you're trying to scrape is filtering that user agent; by adding firefox's one, the server is returning you the webpage as if you were visiting it from the browser.
Upvotes: 1