Reputation: 4122
I am running Python.org version 2.7 64 bit on Windows Vista 64 bit to use Scrapy. I have some code that is working when I run it via Command Shell (apart from some issues with Command Shell not recognising non Unicode characters), however when I try running the script via the Python IDLE i get the following error message:
Warning (from warnings module):
File "C:\Python27\mrscrap\mrscrap\spiders\test.py", line 24
class MySpider(BaseSpider):
ScrapyDeprecationWarning: __main__.MySpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
The code used to generate this error is:
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
class MySpider(BaseSpider):
name = "wiki"
allowed_domains = ["wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Asia"]
def parse(self, response):
titles = response.selector.xpath("normalize-space(//title)")
for titles in titles:
body = response.xpath("//p").extract()
body2 = "".join(body)
print remove_tags(body2)
Firstly, what is the cause of this error when it works fine in Command Shell? Secondly, when I follow the instructions in the error and replace both instances of BaseSpider within the code with just 'Spider' the code runs in Python shell, but does nothing. No error, nothing printed to the log, no errors or warnings, nothing.
Can anyone tell me why this revised version of the code is not printing it's output to the Python IDLE?
Thanks
Upvotes: 0
Views: 2288
Reputation: 180401
Add from scrapy.cmdline import execute
to your imports
Then put execute(['scrapy','crawl','wiki'])
and run your script.
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
from scrapy.cmdline import execute
class MySpider(Spider):
name = "wiki"
allowed_domains = ["wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Asia"]
def parse(self, response):
titles = response.selector.xpath("normalize-space(//title)")
for title in titles:
body = response.xpath("//p").extract()
body2 = "".join(body)
print remove_tags(body2)
execute(['scrapy','crawl','wiki'])
Upvotes: 1