Reputation: 829
I have implemented a little crawler in python and I wanted to try to export the results in elasticsearch as explained in this tutorial.
I've made the fix proposed in the comment because of the update of the elasticsearch for scrapy plug-in (cf github link). I've changed the ELASTICSEARCH_UNIQ_KEY with an existing field in my scraper. Of course I have installed the plugin and checked that my spider worked (I have succeeded in getting output in json the command scrapy crawl brand -o output.json
where brand is the name of my spider)
I've installed elasticsearch and it is running, I have been able to reproduce some examples found here. But it doesn't work when I'm using the following command :scrapy crawl brand
.
I added quotes in the line ELASTICSEARCH_LOG_LEVEL= 'log.DEBUG'
in the settings.py file since log is not recognized without. But now, I have the following error :
Traceback (most recent call last):
File "C:\Users\stephanie\Downloads\WinPython-32bit-2.7.9.2\python-2.7.9\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "C:\Users\stephanie\Downloads\WinPython-32bit-2.7.9.2\python-2.7.9\lib\site-packages\scrapyelasticsearch\scrapyelasticsearch.py", line 70, in process_item
self.index_item(item)
File "C:\Users\stephanie\Downloads\WinPython-32bit-2.7.9.2\python-2.7.9\lib\site-packages\scrapyelasticsearch\scrapyelasticsearch.py", line 53, in index_item
log.msg("Generated unique key %s" % local_id, level=self.settings.get('ELASTICSEARCH_LOG_LEVEL'))
File "C:\Users\stephanie\Downloads\WinPython-32bit-2.7.9.2\python-2.7.9\lib\site-packages\scrapy\log.py", line 49, in msg
logger.log(level, message, *[kw] if kw else [])
File "C:\Users\stephanie\Downloads\WinPython-32bit-2.7.9.2\python-2.7.9\lib\logging\__init__.py", line 1220, in log
raise TypeError("level must be an integer")
TypeError: level must be an integer
2015-08-04 02:06:02 [scrapy] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2015-08-04 02:06:02 [scrapy] INFO: Closing spider (finished)
2015-08-04 02:06:02 [selenium.webdriver.remote.remote_connection] DEBUG: DELETE
http://127.0.0.1:49654/hub/session/209677e4-1577-4f05-a418-8554159d8c74/window {
"sessionId": "209677e4-1577-4f05-a418-8554159d8c74"}
2015-08-04 02:06:03 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2015-08-04 02:06:03 [scrapy] INFO: Dumping Scrapy stats:
I am using python 2.7 and elasticsearch 1.7.1 Do I have to do some configuration with elastic search or what may cause this error ? Thanks for your help.
Upvotes: 3
Views: 4752
Reputation: 5885
I don't have an ElasticSearch setup to try this on, but you could try modifying settings.py, add the following to the top of settings.py
import logging
And change
ELASTICSEARCH_LOG_LEVEL= 'log.DEBUG'
to
ELASTICSEARCH_LOG_LEVEL= logging.DEBUG
If the above still doesn't work, you can try this instead:
ELASTICSEARCH_LOG_LEVEL= 10
Upvotes: 4