Reputation: 257
I can use two different methods to extract the title of an article: xpath vs. css. They will give me the same results, but there is one difference. Using xpath will store the data (json file) in square brackets ["Some Title"]
, while the css selector will simply store data without brackets "Some Title"
. I actually don't want to store the data with brackets. How do I do this with using xpath?
Here is my code for extracting the title of a document:
CSS Selector
def parse_article(self, response):
def extract_with_css(query):
return response.css(query).get(default='').strip()
yield {
'title': extract_with_css('div#title h2::text')
}
Xpath
def parse_article(self, response):
def extract_with_xpath(query):
return response.xpath(query).extract()
yield {
'title': extract_with_xpath('//div[@id="title"]/h2/text()')
}
Upvotes: 0
Views: 224
Reputation: 3717
Edit your code from extract()
to get()
:
def extract_with_xpath(query):
return response.xpath(query).get(default='').strip()
Method extract
return all the matches, and get
only first one.
Upvotes: 2