Reputation: 9725
I was able to generate the first spider ok
Thu Feb 27 - 01:59 PM > scrapy genspider confluenceChildPages confluence
Created spider 'confluenceChildPages' using template 'crawl' in module:
dirbot.spiders.confluenceChildPages
But when I attempted to generate another spider, I got this:
Thu Feb 27 - 01:59 PM > scrapy genspider xxx confluence
Traceback (most recent call last):
File "/usr/bin/scrapy", line 5, in <module>
pkg_resources.run_script('Scrapy==0.22.2', 'scrapy')
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
execute()
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/commands/genspider.py", line 68, in run
crawler = self.crawler_process.create_crawler()
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/crawler.py", line 87, in create_crawler
self.crawlers[name] = Crawler(self.settings)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/crawler.py", line 25, in __init__
self.spiders = spman_cls.from_crawler(self)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py", line 35, in from_crawler
sm = cls.from_settings(crawler.settings)
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py", line 31, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py", line 22, in __init__
for module in walk_modules(name):
File "/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/utils/misc.py", line 68, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/d/Work/TollOnline/Notes/Issues/JIRA/TOL-821_Review_Toll_Online_Confluence_Pages/dirbot-master/dirbot/spiders/confluenceChildPages.py", line 4, in <module>
from scrapybot.items import ScrapybotItem
ImportError: No module named scrapybot.items
Update: Thursday 27 February 2014, 07:35:24 PM - to add information @omair_77 was asking for.
I am using dirbot from https://github.com/scrapy/dirbot.
Initial directory structure is:
.
./.gitignore
./dirbot
./dirbot/items.py
./dirbot/pipelines.py
./dirbot/settings.py
./dirbot/spiders
./dirbot/spiders/dmoz.py
./dirbot/spiders/__init__.py
./dirbot/__init__.py
./README.rst
./scrapy.cfg
./setup.py
I then try to create two spiders:
scrapy genspider confluenceChildPagesWithTags confluence
scrapy genspider confluenceChildPages confluence
and I get the error on the second genspider
command.
Update: Wednesday 5 March 2014, 02:16:07 PM - to add information in relation to @Darian's answer. Showing that scrapybot pops up only after the first genspider command.
Wed Mar 05 - 02:12 PM > find .
.
./.gitignore
./dirbot
./dirbot/items.py
./dirbot/pipelines.py
./dirbot/settings.py
./dirbot/spiders
./dirbot/spiders/dmoz.py
./dirbot/spiders/__init__.py
./dirbot/__init__.py
./README.rst
./scrapy.cfg
./setup.py
Wed Mar 05 - 02:13 PM > find . -type f -print0 | xargs -0 grep -i scrapybot
Wed Mar 05 - 02:14 PM > scrapy genspider confluenceChildPages confluence
Created spider 'confluenceChildPages' using template 'crawl' in module:
dirbot.spiders.confluenceChildPages
Wed Mar 05 - 02:14 PM > find .
.
./.gitignore
./dirbot
./dirbot/items.py
./dirbot/items.pyc
./dirbot/pipelines.py
./dirbot/settings.py
./dirbot/settings.pyc
./dirbot/spiders
./dirbot/spiders/confluenceChildPages.py
./dirbot/spiders/dmoz.py
./dirbot/spiders/dmoz.pyc
./dirbot/spiders/__init__.py
./dirbot/spiders/__init__.pyc
./dirbot/__init__.py
./dirbot/__init__.pyc
./README.rst
./scrapy.cfg
./setup.py
Wed Mar 05 - 02:17 PM > find . -type f -print0 | xargs -0 grep -i scrapybot
./dirbot/spiders/confluenceChildPages.py:from scrapybot.items import ScrapybotItem
./dirbot/spiders/confluenceChildPages.py: i = ScrapybotItem()
and that newly generated confluenceChildPages.py is:
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapybot.items import ScrapybotItem
class ConfluencechildpagesSpider(CrawlSpider):
name = 'confluenceChildPages'
allowed_domains = ['confluence']
start_urls = ['http://www.confluence/']
rules = (
Rule(SgmlLinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
)
def parse_item(self, response):
sel = Selector(response)
i = ScrapybotItem()
#i['domain_id'] = sel.xpath('//input[@id="sid"]/@value').extract()
#i['name'] = sel.xpath('//div[@id="name"]').extract()
#i['description'] = sel.xpath('//div[@id="description"]').extract()
return i
So I can see it references scrapybot, but I am not sure how to fix it.. very much a n00b still.
Upvotes: 0
Views: 535
Reputation: 4084
You see this last line in the traceback:
File "/d/Work/TollOnline/Notes/Issues/JIRA/TOL-821_Review_Toll_Online_Confluence_Pages/dirbot-master/dirbot/spiders/confluenceChildPages.py", line 4, in <module>
from scrapybot.items import ScrapybotItem
This tells me that the first spider you generated "confluenceChildPages" thinks it needs to import items from a module called scrapybot
but that doesn't exist. If you look inside confluenceChildPages.py
you'll be able to see that line which is causing the error.
I'm not actually sure which setting it uses to generate that off the top of my head but if you look (grep) for scrapybot
within your project you should find where it is getting it from and then be able to change it to dirbot
which looks like the module you want.
You will then need to delete the first spider it generated and re-generate it. It errors the 2nd time you create one because it loads in the first spider you generated as part of the project and as this has an import error in it, you get the traceback.
Cheers.
Upvotes: 1
Reputation: 2136
show your directory hierarchy for better solution . this problem occurs mostly when Your spider module is named the same as your scrapy project module, so python is trying to import items relative to spider. so make sure that your project module and spider module name is not same
Upvotes: 1