Reputation: 549
I try to use xpath to get the @content attribute of the following html code:
<meta content="52222" name="DCSext.job_id">
I use this xpath code as a portion of scrapy spider:
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//*')
for site in sites:
il = DataItemLoader(response=response, selector=site)
il.add_xpath('listing_id', 'meta[@name="DCSext.job_id"]@content')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
il.add_xpath('loc_pj', substring-after('h1[@class="title heading"]/text()',':'))
il.add_xpath('title', 'head/title/text()')
il.add_xpath('post_date', 'div[@id="extr"]/div/dl/dd[3]/text()')
il.add_xpath('web_url', 'head/link[@rel="canon"]@href')
yield il.load_item()
I got the error message of of the underlined code:
exceptions.ValueError: Invalid XPath: meta[@name="DCSext.job_id"]@content
How to fix this? Thanks a lot!
Upvotes: 0
Views: 988
Reputation: 549
The correct code should be:
meta[@name="DCSext.job_id"]/@content
^
Upvotes: 1