snapcrack
snapcrack

Reputation: 1813

indentation error when commenting out with triple quotes in scrapy

I have my text editor set to use all spaces with indentation, and I've deleted and re-indented for good measure. I'm experimenting with scrapy to see how different syntax performs and trying to switch between two blocks of code, one with add_xpath and the other with item. The code works fine for one version of the spider but not the other. It works for:

class BasicSpider(scrapy.Spider):
    name = 'basic'
    allowed_domains = ['web']
    start_urls = ['http://foobar.com']

    def parse(self, response):
            l = ItemLoader(item = TestItem(), response=response)
            item = TestItem()

            l.add_xpath('title', '/html/body/div[1]/article/header/div[3]/h1/text()')
            l.add_xpath('author', '/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()')
            l.add_xpath('published', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
            l.add_xpath('year', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
            l.add_xpath('month', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
            item['publication'] = 'Foobar'
            l.add_xpath('content', '/html/body/div[1]/article/div/div[3]/div[1]//p/text()')

            return l.load_item(), item

However, if I want to comment out the above code and use only item instead of add_xpath:

'''
        l.add_xpath('title', '/html/body/div[1]/article/header/div[3]/h1/text()')
        l.add_xpath('author', '/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()')
        l.add_xpath('published', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
        l.add_xpath('year', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
        l.add_xpath('month', '/html/body/div[1]/article/div/div[2]/div[2]/time/text()')
        item['publication'] = 'Foobar'
        l.add_xpath('content', '/html/body/div[1]/article/div/div[3]/div[1]//p/text()')

        return l.load_item(), item
'''

        item['title'] = response.xpath('/html/body/div[1]/article/header/div[3]/h1/text()').extract()
        item['author'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()').extract()
        item['published'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[2]/time/text()').extract()
        item['year'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[2]/time/text()').extract()
        item['month'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[2]/time/text()').extract()
        item['publication'] = 'Foobar'
        item['content'] = response.xpath('/html/body/div[1]/article/div/div[3]/div[1]//p/text()').extract()

        return item

it tells me I have indention errors on item['title'] and all further lines until I unindent all the items all the way to the left and outside of the function, like this:

 def parse(self, response):
        #l = ItemLoader(item = TestItem(), response=response)
        item = TestItem()
 item['title'] = response.xpath('/html/body/div[1]/article/header/div[3]/h1/text()').extract()
 item['author'] = response.xpath('/html/body/div[1]/article/div/div[2]/div[1]/span/span[1]/a/text()').extract()

If I try fully doing the same thing by fully unindenting the return statement, it tells me that return is outside of the function, and when I insert it back inside the function, I get an unexpected indent error. If I remove all the comments and the entire add_xpath block, the code works fine.

I'd like to be able to easily go back and forth between add_xpath and items, and I'm also not sure if I'm misunderstanding some rule about triple quotes. This happens when editing with multiple text editors.

Upvotes: 0

Views: 901

Answers (1)

jacoblaw
jacoblaw

Reputation: 1283

The triple quotes need to be indented to match what you are commenting out. This is because you're turning that code into a string, it's not ignored by python, the string just has no effect on your output.

Upvotes: 2

Related Questions