Dan
Dan

Reputation: 257

Xpath vs. CSS selector in Scrapy: Why is data stored differently?

I can use two different methods to extract the title of an article: xpath vs. css. They will give me the same results, but there is one difference. Using xpath will store the data (json file) in square brackets ["Some Title"], while the css selector will simply store data without brackets "Some Title". I actually don't want to store the data with brackets. How do I do this with using xpath?

Here is my code for extracting the title of a document:

CSS Selector

def parse_article(self, response):
    def extract_with_css(query):
        return response.css(query).get(default='').strip() 


    yield {
        'title': extract_with_css('div#title h2::text')           
          }

Xpath

 def parse_article(self, response):
    def extract_with_xpath(query):
        return response.xpath(query).extract() 


    yield {
        'title': extract_with_xpath('//div[@id="title"]/h2/text()') 
          }

Upvotes: 0

Views: 224

Answers (1)

vezunchik
vezunchik

Reputation: 3717

Edit your code from extract() to get():

def extract_with_xpath(query):
    return response.xpath(query).get(default='').strip() 

Method extract return all the matches, and get only first one.

Upvotes: 2

Related Questions