Reputation: 185
I have a spider written that will scrape a webpage and populate the fields in an item. The item has fields as follows
class exampleitem():
ex1 = Field()
ex2 = Field()
ex3 = Field()
... and so forth
When I scrape and export into an xml file, the order of the Fields become messed up and returns something like this
<items>
<item>
<ex2> <value> xyz </value> </ex2>
<ex3> <value> abc </value> </ex3>
<ex1> <value> ghi </value> </ex1>
</item>
... so forth
</items>
I want to make it so that the xml is formatted in the exact order it is written in the Fields() for my item.py file.
I've been doing research for the past hour or so, and I know it has something to do with my pipeline and utilizing xmlitemexporter, but I have no idea at all how to custom make my pipeline or even where to start.
In short, I am getting lost in the jargon and I'd appreciate it if anyone could point me in a direction or give me a short example code of how I can begin to format my scraped items!
Thank you so much
Upvotes: 2
Views: 1122
Reputation: 11396
scrapy Items are wrappers of python dict and will return the item fields in an unpredicted order
def keys(self):
return self._values.keys()
to change that you can either override this function in your items like:
class exampleitem(Item):
ex1 = Field()
ex2 = Field()
ex3 = Field()
def keys(self):
return ['ext1', 'ext2', 'ext3']
or, in a more generic way to implement DictItem and use python's OrderedDict instead of the python's default dict that its currently using.
Upvotes: 5