Reputation: 2853
I'm trying to "select" the link from the onclick attribute in the following portion of html
<span onclick="Javascript:document.quickFindForm.action='/blah_blah'"
class="specialLinkType"><img src="blah"></span>
but can't get any further than the following XPath
//span[@class="specialLinkType"]/@onclick
which only returns
Javascript:document.quickFindForm.action
Any ideas on how to pick out that link inside of the quickFindForm.action
with an XPath?
Upvotes: 3
Views: 2285
Reputation: 957
I used xquery but it should be the same in xpath. I used an xpath function "tokenize" that splits a string based on a regular expression (http://www.xqueryfunctions.com/xq/fn_tokenize.html). In this case I split the string basing on " ' "
xquery version "1.0";
let $x := //span[@class="specialLinkType"]/@onclick
let $c := fn:tokenize( $x, '''' )
return $c[2]
That in xpath shoud be:
fn:tokenize(//span[@class="specialLinkType"]/@onclick, '''' )[2]
Upvotes: 0
Reputation: 1701
If Scrapy supports XPath string functions this will work
substring-before(
substring-after(
//span[@class="specialLinkType"]/@onclick,"quickFindForm.action='")
,"'")
It looks like it also supports regex. Something like this should work
.select('//span[@class="specialLinkType"]/@onclick').re(r'quickFindForm.action=\'(.*?)\'')
Caveat: I can't test the second solution and you will have to check that \'
is the proper escape sequence for single quotes in this case.
Upvotes: 0
Reputation: 556
I tried the XPath in a Java application and it worked ok:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Teste {
public static void main(String[] args) throws Exception {
Document doc = stringToDom("<span onclick=\"Javascript:document.quickFindForm.action='/blah_blah'\" class=\"specialLinkType\"><img src=\"blah\"/></span>");
XPath newXPath = XPathFactory.newInstance().newXPath();
XPathExpression xpathExpr = newXPath.compile("//span[@class=\"specialLinkType\"]/@onclick");
String result = xpathExpr.evaluate(doc);
System.out.println(result);
}
public static Document stringToDom(String xmlSource) throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xmlSource)));
}
}
Result:
Javascript:document.quickFindForm.action='/blah_blah'
Upvotes: 1