Reputation: 55
I've been following this tutorial (http://blog.florian-hopf.de/2014/07/scrapy-and-elasticsearch.html) and using this scrapy elasticsearch pipeline (https://github.com/knockrentals/scrapy-elasticsearch) and am able to extract data from scrapy to a JSON file and have an elasticsearch server up and running on localhost.
However, when I attempt to send scraped data into elasticsearch using the pipeline, I get the following error:
2015-08-05 21:21:53 [scrapy] ERROR: Error processing {'link': [u'http://www.meetup.com/Search-Meetup-Karlsruhe/events/221907250/'],
'title': [u'Alles rund um Elasticsearch']}
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 70, in process_item
self.index_item(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 52, in index_item
local_id = hashlib.sha1(item[uniq_key]).hexdigest()
TypeError: must be string or buffer, not list
my items.py scrapy file looks like this:
from scrapy.item import Item, Field
class MeetupItem(Item):
title = Field()
link = Field()
description = Field()
and (i think only the relevant part of) my settings.py file looks like this:
from scrapy import log
ITEM_PIPELINES = [
'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline',
]
ELASTICSEARCH_SERVER = 'localhost' # If not 'localhost' prepend 'http://'
ELASTICSEARCH_PORT = 9200 # If port 80 leave blank
ELASTICSEARCH_USERNAME = ''
ELASTICSEARCH_PASSWORD = ''
ELASTICSEARCH_INDEX = 'meetups'
ELASTICSEARCH_TYPE = 'meetup'
ELASTICSEARCH_UNIQ_KEY = 'link'
ELASTICSEARCH_LOG_LEVEL= log.DEBUG
any help would be greatly appreciated!
Upvotes: 3
Views: 817
Reputation: 3691
As you can see in the error message: Error processing {'link': [u'http://www.meetup.com/Search-Meetup-Karlsruhe/events/221907250/'], 'title': [u'Alles rund um Elasticsearch']}
your item's link
and title
fields are lists (the square brackets around the values indicate this).
This is because of your extraction in Scrapy. You did not post it with your question but you should use response.xpath().extract()[0]
to get the first result of the list. Naturally in this case you should prepare to encounter empty result sets to avoid index-errors.
Update
For the situation where you do not extract anything you could prepare with the following:
linkSelection = response.xpath().extract()
item['link'] = linkSelection[0] if linkSelection else ""
Or something alike depending on your data and fields. Perhaps None
could be valid too if the list is empty.
The basic idea is to split up XPath extraction and list-item selection. And you should select an item from the list if it contains the required elements.
Upvotes: 2