Unable to scrape dynamic web page

Question

I am trying to scrape the table found https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&1_Filter-Family=595&2_StatusCodeText=4

I tried using BeautifulSoup and Soup is unable to parse the info located inside the "body" tag. I get a null output when I try to parse the table.

How can I workaround this?

furas · Accepted Answer

This page uses JavaScript to add data but BeautifulSoup/LXML can't run JavaScript - if you turn off javaScrip in browser and load page then you will see what BeautifulSoup/LXML can get.

You may need Selenium to control web browser which can run JavaScript.

Or you can try to use DevTools in Chrome/Firefox (tab Network) to get url usesJavaScript(AJAX/XHR) to download data. And you can try to use this url withrequestsandBeautifulSoup`

I found it uses url:

https://ark.intel.com/libs/apps/intel/support/ark/advancedFilterSearch?productType=873&1_Filter-Family=595&2_StatusCodeText=4&forwardPath=/content/www/us/en/ark/search/featurefilter.html&pageNo=1

I didn't check if requests will need special settings (ie. cookies, headers) to get it.

Unable to scrape dynamic web page

Answers (2)

Related Questions