Reputation: 75
I'm tryng to do web scraping by importXML in Google Spreadsheet, reading the content in this page:
What I need to do is select the list below "Lista de Procesos, and separate it by rows. I went to the page, inspected and copy the XPath
//*[@id="node-page-442"]/div[1]/div/div/div/p[5]
Resulting in this code:
=importxml("http://ddp.usach.cl/node/442";"//*[@id='node-page-442']/div[1]/div/div/div/p[7]/text()")
However, when I try to load it I get an error #N/A
"Imported content is empty"
Upvotes: 0
Views: 236
Reputation: 29022
One path to get the nodes following the h4
element with the content "Lista de Procesos" is
//article[@id='node-page-442']/div[contains(@class, 'content')]/div[contains(@class, 'field-name-body')]/div[@class='field-items']/div[contains(@class,'field-item')]/h4[contains(text(), 'Lista de Procesos')]/following-sibling::*
The retrieved children are not structured, but complete. If you can use XSLT-2.0, you could structure them by using for-each-group
with group-starting-with='strong'
. But this is only one possibility.
The expression could be reduced to the simple term:
//h4[contains(text(),'Lista de Procesos')]/following-sibling::*
Maybe this suits your needs better.
Upvotes: 1