weizer
weizer

Reputation: 1117

Unable to scrape the data from website using IMPORTHTML or IMPORTXML in Google Sheet

enter image description here

I want to scrape the data from the website as shown in the screenshot above (data in red box) using Google Sheet. I tried to use IMPORTHTML and IMPORTXML but both are not working (output is empty).

This is my Google Sheet:

https://docs.google.com/spreadsheets/d/1ELo3iA4RmhUuFq7YEfsCVt2iuURFxc1Crdng7rLovTo/edit#gid=0

I'm not sure whether it is possible to scrape the data from this website (https://stockrow.com/AAPL) by using IMPORTHTML or IMPORTXML. Or is it possible to use Google Apps Script to achieve that?

Upvotes: 1

Views: 851

Answers (1)

NightEye
NightEye

Reputation: 11184

With these kind of sites, it is impossible for Sheets and Apps Script to scrape them due to the contents being dynamically generated as the comments already mentioned.

When someone is scraping with these kind of sites, most of them do use Selenium in Python. Basically, what it does is perform browser automation.

I know this might be useless information for you since Google App Engine isn't a tag, but for everyone else that would likely to encounter this issue and is quite familiar with Selenium in Python, this might be of help.

Running Selenium in Google App Engine can be a solution but if you don't want to invest time in studying and understanding Python together with Google App Engine, I recommend you steer clear from this. References that can give light to the issue are listed at the bottom.

Alternative:

  • The best way to overcome the issue without investing too much time is to find an alternative site that its content isn't generated by JavaScript and does provide you with the same data.
  • One way of checking the site if it is JS generated is to check the page source. If the one you are scraping is in the source code, then that text isn't JavaScript generated.

Reference:

Upvotes: 1

Related Questions