Reputation: 1813
I have my text editor set to use all spaces with indentation, and I've deleted and re-indented for good measure. I'm experimenting with scrapy to see how different syntax performs and trying to switch between two blocks of code, one with add_xpath
and the other with item
. The code works fine for one version of the spider but not the other. It works for:
class BasicSpider(scrapy.Spider):
name = 'basic'
allowed_domains = ['web']
start_urls = ['http://foobar.com']
def parse(self, response):
l = ItemLoader(item = TestItem(), response=response)
item = TestItem()
l.add_xpath('title', '/html/body/div[1]/article/header/div[3]/h1/text()')
l.add_xpath('author', '/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()')
l.add_xpath('published', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
l.add_xpath('year', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
l.add_xpath('month', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
item['publication'] = 'Foobar'
l.add_xpath('content', '/html/body/div[1]/article/div/div[3]/div[1]//p/text()')
return l.load_item(), item
However, if I want to comment out the above code and use only item
instead of add_xpath
:
'''
l.add_xpath('title', '/html/body/div[1]/article/header/div[3]/h1/text()')
l.add_xpath('author', '/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()')
l.add_xpath('published', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
l.add_xpath('year', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
l.add_xpath('month', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
item['publication'] = 'Foobar'
l.add_xpath('content', '/html/body/div[1]/article/div/div[3]/div[1]//p/text()')
return l.load_item(), item
'''
item['title'] = response.xpath('/html/body/div[1]/article/header/div[3]/h1/text()').extract()
item['author'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()').extract()
item['published'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[2]/time/text()').extract()
item['year'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[2]/time/text()').extract()
item['month'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[2]/time/text()').extract()
item['publication'] = 'Foobar'
item['content'] = response.xpath('/html/body/div[1]/article/div/div[3]/div[1]//p/text()').extract()
return item
it tells me I have indention errors on item['title']
and all further lines until I unindent all the items all the way to the left and outside of the function, like this:
def parse(self, response):
#l = ItemLoader(item = TestItem(), response=response)
item = TestItem()
item['title'] = response.xpath('/html/body/div[1]/article/header/div[3]/h1/text()').extract()
item['author'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()').extract()
If I try fully doing the same thing by fully unindenting the return statement, it tells me that return is outside of the function, and when I insert it back inside the function, I get an unexpected indent error. If I remove all the comments and the entire add_xpath block, the code works fine.
I'd like to be able to easily go back and forth between add_xpath and items, and I'm also not sure if I'm misunderstanding some rule about triple quotes. This happens when editing with multiple text editors.
Upvotes: 0
Views: 901
Reputation: 1283
The triple quotes need to be indented to match what you are commenting out. This is because you're turning that code into a string, it's not ignored by python, the string just has no effect on your output.
Upvotes: 2