thanks_in_advance
thanks_in_advance

Reputation: 2743

xpath find inside script tag

I'm using Google Sheets to extract content from THIS PAGE using xpath.

Using importXML(), I am able to extract HTML nodes easily via xpath, e.g., using: //*[@id='result_listing_1_0']/div[1]

However, when I try to extract something that's inside a script tag, I get an error (e.g., when using xpath such as //*[@id='exam_info_window_content_0_0'] ). In this case, the ID is inside a script tag.

How can I use xpath to extract HTML that's inside the script tag in a web page's source.

Update: here's an example of the output I want:

Notes for students:

Students must present a valid/legible photo ID before each appointment. Electronic devices are not permitted during appointments unless otherwise stated in exam instructions (no cell phones; cell phones may not be used as calculators). Students must leave cell phones at home, in a locked car, or in the care of the proctor. All appointments must be confirmed in advance.

Fee details:

computer-based exam - $40 for two hours paper-based exam - $30 for two hours

Website:

http://www.csun.edu/testing (without escape characters this is: http:www.csun.edu/testing)


Notes for students:

Students must present a valid/legible photo ID before each appointment. Electronic devices are not permitted during appointments unless otherwise stated in exam instructions (no cell phones; cell phones may not be used as calculators). Students must leave cell phones at home, in a locked car, or in the care of the proctor. All appointments must be confirmed 24 hours in advance.

Fee details:

25$ covers a single visit. Multiple tests may be taken at one visit. Free parking.

Website:

http://www.spectrumlearningcenters.com (without escape characters this is: www.spectrumlearningcenters.com)

The output will be extracted from the map markers on the page: enter image description here enter image description here

Upvotes: 1

Views: 1942

Answers (1)

Quentin
Quentin

Reputation: 943230

Script elements contain only text nodes.

You would need to either match the text (with contains) or get the entire text node, extract the HTML from it, parse that HTML into a DOM and then run XPath on the new DOM.

Upvotes: 2

Related Questions