Reputation: 11
I'm doing some analysis which requires me to save table data and (hyperlinked) links to lots of PDF's from a webpage (https://www.asx.com.au/asx/v2/statistics/prevBusDayAnns.do).
I've been playing around with the =IMPORTHTML
and =IMPORTXML
formulas in Google Sheets and have managed to extract the table data using =IMPORTHTML(A1,"table",1)
, but I'm struggling to extract the "Price sens." column which contains images or the hyperlinks attached to the "Headline" items. I'm having no luck with IMPORTXML
so far, and can't seem to find any solutions online.
Upvotes: 1
Views: 506
Reputation: 8346
The formula for IMPORTXML
you're looking for is:
=IMPORTXML("https://www.asx.com.au/asx/v2/statistics/prevBusDayAnns.do","//*[@id='content']/div/announcement_data/table/tbody/tr")
You need to provide an XPATH, which you can get by clicking on an element in the browser dev tools and selecting copy > XPATH
.
Unfortunately, while this does produce output, it's just the same as for IMPORTHTML
. The price sensitivity column is always empty, too.
The reason for this is, that the content of the price sensitivity columns is not text, but an image, as you can see in your screenshots.
So it looks like you need some more powerful HTML parsing tools here than Google Sheets provides. It would be easy to look for img tags if you parsed the website using Python and Beautifulsoup, for instance. So you may want to go down this route.
Here's what I got using IMPORTXML
, same as you:
The problem is that price sensitivity is img not text:
Upvotes: 1