Reputation: 2743
I'm using Google Sheets to extract content from THIS PAGE using xpath.
Using importXML(), I am able to extract HTML nodes easily via xpath, e.g., using: //*[@id='result_listing_1_0']/div[1]
However, when I try to extract something that's inside a script tag, I get an error (e.g., when using xpath such as //*[@id='exam_info_window_content_0_0']
). In this case, the ID is inside a script tag.
How can I use xpath to extract HTML that's inside the script tag in a web page's source.
Update: here's an example of the output I want:
Notes for students:
Students must present a valid/legible photo ID before each appointment. Electronic devices are not permitted during appointments unless otherwise stated in exam instructions (no cell phones; cell phones may not be used as calculators). Students must leave cell phones at home, in a locked car, or in the care of the proctor. All appointments must be confirmed in advance.
Fee details:
computer-based exam - $40 for two hours paper-based exam - $30 for two hours
Website:
http://www.csun.edu/testing (without escape characters this is: http:www.csun.edu/testing)
Notes for students:
Students must present a valid/legible photo ID before each appointment. Electronic devices are not permitted during appointments unless otherwise stated in exam instructions (no cell phones; cell phones may not be used as calculators). Students must leave cell phones at home, in a locked car, or in the care of the proctor. All appointments must be confirmed 24 hours in advance.
Fee details:
25$ covers a single visit. Multiple tests may be taken at one visit. Free parking.
Website:
http://www.spectrumlearningcenters.com (without escape characters this is: www.spectrumlearningcenters.com)
The output will be extracted from the map markers on the page:
Upvotes: 1
Views: 1942
Reputation: 943230
Script elements contain only text nodes.
You would need to either match the text (with contains
) or get the entire text node, extract the HTML from it, parse that HTML into a DOM and then run XPath on the new DOM.
Upvotes: 2