null
null

Reputation: 9144

How to maintain Item's Field Names in one file with Scrapy?

In scrapy, it involves a lot of Item's field names writing.

1. Item class (Items.py)

class HelloItem(scrapy.Item):
   Name = scrapy.Field()
   Address = scrapy.Field()
   ...

2. Spider class (spider.py)

class HelloSpider(scrapy.Spider):

    def parse(self, response):
       item = HelloItem()
       item["Name"] = ...
       item["Address'] = ...
       ...

3. settings.py

EXPORT_FIELDS = ["Name", "Address", ...]

I defined EXPORT_FIELDS setting in settings.py to be used for defining the fields ordering for custom CSV item pipelines. The CSV pipeline code is like this, except the self.exporter.fields_to_export is loaded by settings.getlist("EXPORT_FIELDS").


You can see there are three places I have to define the field names (Name, Address, etc). If one day I have to rename some field names, I have to change them in those three files.

So is there a way to unite the Item's field name definitions in just one file? (or two files is also alright, the lesser is better than nothing)

Upvotes: 3

Views: 793

Answers (1)

Gallaecio
Gallaecio

Reputation: 3857

You could not use items at all, and yield dictionaries instead. That way, you would not need items.py at all.

However, as a project grows, defining an Item subclass is recommended, and the repetition you mention is a lesser evil.

Thanks to defining an Item you can get an error message when you try to scrape an item field with a typo in one of your spiders.

Item classes also allow you to work with item loaders.

Upvotes: 0

Related Questions