Reputation: 9144
In scrapy, it involves a lot of Item's field names writing.
1. Item class (Items.py)
class HelloItem(scrapy.Item):
Name = scrapy.Field()
Address = scrapy.Field()
...
2. Spider class (spider.py)
class HelloSpider(scrapy.Spider):
def parse(self, response):
item = HelloItem()
item["Name"] = ...
item["Address'] = ...
...
3. settings.py
EXPORT_FIELDS = ["Name", "Address", ...]
I defined EXPORT_FIELDS
setting in settings.py
to be used for defining the fields ordering for custom CSV item pipelines. The CSV pipeline code is like this, except the self.exporter.fields_to_export
is loaded by settings.getlist("EXPORT_FIELDS")
.
You can see there are three places I have to define the field names (Name, Address, etc). If one day I have to rename some field names, I have to change them in those three files.
So is there a way to unite the Item's field name definitions in just one file? (or two files is also alright, the lesser is better than nothing)
Upvotes: 3
Views: 793
Reputation: 3857
You could not use items at all, and yield dictionaries instead. That way, you would not need items.py
at all.
However, as a project grows, defining an Item
subclass is recommended, and the repetition you mention is a lesser evil.
Thanks to defining an Item you can get an error message when you try to scrape an item field with a typo in one of your spiders.
Item classes also allow you to work with item loaders.
Upvotes: 0