gdogg371
gdogg371

Reputation: 4122

Python Shell not running Scrapy

I am running Python.org version 2.7 64 bit on Windows Vista 64 bit to use Scrapy. I have some code that is working when I run it via Command Shell (apart from some issues with Command Shell not recognising non Unicode characters), however when I try running the script via the Python IDLE i get the following error message:

Warning (from warnings module):
  File "C:\Python27\mrscrap\mrscrap\spiders\test.py", line 24
    class MySpider(BaseSpider):
ScrapyDeprecationWarning: __main__.MySpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)

The code used to generate this error is:

from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re

class MySpider(BaseSpider):
    name = "wiki"
    allowed_domains = ["wikipedia.org"]
    start_urls = ["http://en.wikipedia.org/wiki/Asia"]

    def parse(self, response):
        titles = response.selector.xpath("normalize-space(//title)")
        for titles in titles:

            body = response.xpath("//p").extract()
            body2 = "".join(body)
            print remove_tags(body2)

Firstly, what is the cause of this error when it works fine in Command Shell? Secondly, when I follow the instructions in the error and replace both instances of BaseSpider within the code with just 'Spider' the code runs in Python shell, but does nothing. No error, nothing printed to the log, no errors or warnings, nothing.

Can anyone tell me why this revised version of the code is not printing it's output to the Python IDLE?

Thanks

Upvotes: 0

Views: 2288

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180401

Add from scrapy.cmdline import execute to your imports

Then put execute(['scrapy','crawl','wiki']) and run your script.

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
from scrapy.cmdline import execute
class MySpider(Spider):
    name = "wiki"
    allowed_domains = ["wikipedia.org"]
    start_urls = ["http://en.wikipedia.org/wiki/Asia"]

    def parse(self, response):
        titles = response.selector.xpath("normalize-space(//title)")
        for title in titles:

            body = response.xpath("//p").extract()
            body2 = "".join(body)
            print remove_tags(body2)

execute(['scrapy','crawl','wiki'])

Upvotes: 1

Related Questions