Python Learner
Python Learner

Reputation: 185

How to order xml with item fields in scrapy?

I have a spider written that will scrape a webpage and populate the fields in an item. The item has fields as follows

class exampleitem():
    ex1 = Field()
    ex2 = Field()
    ex3 = Field()
    ... and so forth

When I scrape and export into an xml file, the order of the Fields become messed up and returns something like this

<items>
    <item>
        <ex2> <value> xyz </value> </ex2>
        <ex3> <value> abc </value> </ex3>
        <ex1> <value> ghi </value> </ex1>
    </item>
    ... so forth
</items>

I want to make it so that the xml is formatted in the exact order it is written in the Fields() for my item.py file.

I've been doing research for the past hour or so, and I know it has something to do with my pipeline and utilizing xmlitemexporter, but I have no idea at all how to custom make my pipeline or even where to start.

In short, I am getting lost in the jargon and I'd appreciate it if anyone could point me in a direction or give me a short example code of how I can begin to format my scraped items!

Thank you so much

Upvotes: 2

Views: 1122

Answers (1)

Guy Gavriely
Guy Gavriely

Reputation: 11396

scrapy Items are wrappers of python dict and will return the item fields in an unpredicted order

def keys(self):
    return self._values.keys()

to change that you can either override this function in your items like:

class exampleitem(Item):
    ex1 = Field()
    ex2 = Field()
    ex3 = Field()

    def keys(self):
        return ['ext1', 'ext2', 'ext3']

or, in a more generic way to implement DictItem and use python's OrderedDict instead of the python's default dict that its currently using.

Upvotes: 5

Related Questions