Reputation: 548
So I have an items.py with the following:
class ScrapyItem(scrapy.Item):
source = scrapy.Field()
link = scrapy.Field()
and the json output is:
[{"source": "Some source", "link":"www.somelink.com"},
{"source": "Some source again", "link":"www.somelink.org"}]
is there a way change the output to:
[{"source1": "Some source", "link1":"www.somelink.com"},
{"source2": "Some source again", "link2":"www.somelink.org"}]
From the docs, I saw you can manipulate the item values, can you do the same to the items themselves?
EDIT
Here's the new code I'm using for the output with an article_id item_field
article_id = [1]
def parse_common(self, response):
feed = feedparser.parse(response.body)
for entry_n, entry in enumerate(feed.entries, start=article_id[-1]):
try:
item = NewsbyteItem()
item['source'] = response.url
item['title'] = lxml.html.fromstring(entry.title).text
item['link'] = entry.link
item['description'] = entry.description
item['article_id'] = '%d' % entry_n
article_id.append(entry_n)
request = Request(
entry.link,
callback=getattr(self, response.meta['method']),
dont_filter=response.meta.get('dont_filter', False)
)
request.meta['item'] = item
request.meta['entry'] = entry
yield request
except Exception as e:
print '%s: %s' % (type(e), e)
print entry
The problem is the entry_n restarts whenever it changes to another url. That's why the list was used.
Upvotes: 2
Views: 904
Reputation: 10349
From the discussion
The purpose of the identifier is if an item has some data missing or includes data that isn't needed, I can find that dictionary right away and refactor the code accordingly.
With that purpose in mind, I'd suggest to generate UUIDs. Same effect, less hassle:
# item definition
class ScrapyItem(scrapy.Item):
source = scrapy.Field()
link = scrapy.Field()
uuid = scrapy.Field()
# processing
def parse_common(self, response):
...
item['uuid'] = uuid.uuid4()
...
Upvotes: 0
Reputation: 5562
I don't recommend you to identify different items changing the key of your item's values. You can do instead a dictionary with naming the responses doing something like:
output = [{"source": "Some source", "link":"www.somelink.com"}, {"source": "Some source again", "link":"www.somelink.org"}]
output_dict = {}
for counter, item in enumerate(output):
output_dict['item' + str(counter + 1)] = item
print output_dict
Upvotes: 3