Reputation: 492
I am trying to scrape the table found https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&1_Filter-Family=595&2_StatusCodeText=4
I tried using BeautifulSoup and Soup is unable to parse the info located inside the "body" tag. I get a null output when I try to parse the table.
How can I workaround this?
Upvotes: 0
Views: 443
Reputation: 143098
This page uses JavaScript to add data but BeautifulSoup
/LXML
can't run JavaScript - if you turn off javaScrip in browser and load page then you will see what BeautifulSoup/LXML can get.
You may need Selenium to control web browser which can run JavaScript.
Or you can try to use DevTools
in Chrome
/Firefox
(tab Network) to get url uses
JavaScript(
AJAX/
XHR) to download data. And you can try to use this url with
requestsand
BeautifulSoup`
I found it uses url:
I didn't check if requests
will need special settings (ie. cookies, headers) to get it.
Upvotes: 1
Reputation: 129
You can use Puppeteer to 'control' the dynamic web page, and scrape it with BS.
See here : https://github.com/puppeteer/puppeteer/tree/master/examples
Upvotes: 0