Keshav Choudhary
Keshav Choudhary

Reputation: 492

Unable to scrape dynamic web page

I am trying to scrape the table found https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&1_Filter-Family=595&2_StatusCodeText=4

I tried using BeautifulSoup and Soup is unable to parse the info located inside the "body" tag. I get a null output when I try to parse the table.

How can I workaround this?

Upvotes: 0

Views: 443

Answers (2)

furas
furas

Reputation: 143098

This page uses JavaScript to add data but BeautifulSoup/LXML can't run JavaScript - if you turn off javaScrip in browser and load page then you will see what BeautifulSoup/LXML can get.

You may need Selenium to control web browser which can run JavaScript.

Or you can try to use DevTools in Chrome/Firefox (tab Network) to get url usesJavaScript(AJAX/XHR) to download data. And you can try to use this url withrequestsandBeautifulSoup`

I found it uses url:

https://ark.intel.com/libs/apps/intel/support/ark/advancedFilterSearch?productType=873&1_Filter-Family=595&2_StatusCodeText=4&forwardPath=/content/www/us/en/ark/search/featurefilter.html&pageNo=1

I didn't check if requests will need special settings (ie. cookies, headers) to get it.

Upvotes: 1

Chris L
Chris L

Reputation: 129

You can use Puppeteer to 'control' the dynamic web page, and scrape it with BS.

See here : https://github.com/puppeteer/puppeteer/tree/master/examples

Upvotes: 0

Related Questions