Getting XPath attributes with Scrapy

Question

I am parsing an XML document with Scrapy and having trouble with the XPaths.

My XML looks like this:

I need the url following rdf:about=. I am using scrapy's remove_namespaces() feature, so that I don't need to use namespaces in my XPath. I have tried the following XPaths but they all return []:

xxs.select('//record/metadata/RDF/Resource/screen/Image/about').extract()

xxs.select('//record/metadata/RDF/Resource/screen/Image/@about').extract()

xxs.select('//record/metadata/RDF/Resource/screen/Image[@about]').extract()

xxs.select('//record/metadata/RDF/Resource/screen[@about]').extract()

xxs.select('//record/metadata/RDF/Resource/screen/@about').extract()

And many other similar variations.

I know that the path up to '//record/metadata/RDF/Resource/screen/Image' is correct because that outputs data, but like I said, the ones about that try to access the "rdf:about" section all come up with []. I really don't think namespaces are an issue, since I removed the namespaces, but again I could be wrong.

R.J. · Accepted Answer

If you are still looking for the XPath to the attribute:

//record/metadata/RDF/Resource/screen/Image/attribute::rdf:about

I haven't tested it but something similar will pull the attribute

you can read more about xpath at http://www.w3schools.com/xpath/xpath_axes.asp

Getting XPath attributes with Scrapy

Answers (1)

Related Questions