Xpath importxml google spreadheet

Question

I'm tryng to do web scraping by importXML in Google Spreadsheet, reading the content in this page:

http://ddp.usach.cl/procesos-de-seleccion-internos

What I need to do is select the list below "Lista de Procesos, and separate it by rows. I went to the page, inspected and copy the XPath

//*[@id="node-page-442"]/div[1]/div/div/div/p[5]

Resulting in this code:

=importxml("http://ddp.usach.cl/node/442";"//*[@id='node-page-442']/div[1]/div/div/div/p[7]/text()")

However, when I try to load it I get an error #N/A

"Imported content is empty"

zx485 · Accepted Answer

One path to get the nodes following the h4 element with the content "Lista de Procesos" is

//article[@id='node-page-442']/div[contains(@class, 'content')]/div[contains(@class, 'field-name-body')]/div[@class='field-items']/div[contains(@class,'field-item')]/h4[contains(text(), 'Lista de Procesos')]/following-sibling::*

The retrieved children are not structured, but complete. If you can use XSLT-2.0, you could structure them by using for-each-group with group-starting-with='strong'. But this is only one possibility.

The expression could be reduced to the simple term:

//h4[contains(text(),'Lista de Procesos')]/following-sibling::*

Maybe this suits your needs better.

Xpath importxml google spreadheet

Answers (1)

Related Questions