Reputation: 33
I am parsing an XML document with Scrapy and having trouble with the XPaths.
My XML looks like this:
<sdn:screen>
<foaf:Image rdf:about="http://search.shinrokuden.irides.tohoku.ac.jp/shinrokuden/archive/screen/07f9d1a0-5ef4-11e2-91ca-000c2923bf22.jpg"/>
</sdn:screen>
I need the url following rdf:about=
. I am using scrapy's remove_namespaces()
feature, so that I don't need to use namespaces in my XPath. I have tried the following XPaths but they all return []
:
xxs.select('//record/metadata/RDF/Resource/screen/Image/about').extract()
xxs.select('//record/metadata/RDF/Resource/screen/Image/@about').extract()
xxs.select('//record/metadata/RDF/Resource/screen/Image[@about]').extract()
xxs.select('//record/metadata/RDF/Resource/screen[@about]').extract()
xxs.select('//record/metadata/RDF/Resource/screen/@about').extract()
And many other similar variations.
I know that the path up to '//record/metadata/RDF/Resource/screen/Image'
is correct because that outputs data, but like I said, the ones about that try to access the "rdf:about" section all come up with []
. I really don't think namespaces are an issue, since I removed the namespaces, but again I could be wrong.
Upvotes: 1
Views: 2306
Reputation: 48
If you are still looking for the XPath to the attribute:
//record/metadata/RDF/Resource/screen/Image/attribute::rdf:about
I haven't tested it but something similar will pull the attribute
you can read more about xpath at http://www.w3schools.com/xpath/xpath_axes.asp
Upvotes: 2