user1592380
user1592380

Reputation: 36307

How to get order of fields in Scrapy item

I'm interested in keeping reference to the order of the field names in a scrapy item. where is this stored?

>>> dir(item)
Out[7]: 
['_MutableMapping__marker',
 '__abstractmethods__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__doc__',
 '__eq__',
 '__format__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__hash__',
 '__init__',
 '__iter__',
 '__len__',
 '__metaclass__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_cache',
 '_abc_negative_cache',
 '_abc_negative_cache_version',
 '_abc_registry',
 '_class',
 '_values',
 'clear',
 'copy',
 'fields',
 'get',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

I tried item.keys(), but that returns an unordered dict

Upvotes: 1

Views: 2712

Answers (1)

elacuesta
elacuesta

Reputation: 911

Item class has a dict interface, storing the values in the _values dict, which does not keep track of the key order (https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53). I believe you could subclass from Item and override the __init__ method to make that container an Ordereddict:

from scrapy import Item
from collections import OrderedDict

class OrderedItem(Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:  # avoid creating dict for most common case
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

The item then preserves the order in which the values were assigned:

In [28]: class SomeItem(OrderedItem):
    ...:     a = Field()
    ...:     b = Field()
    ...:     c = Field()
    ...:     d = Field()
    ...: 
    ...: i = SomeItem()
    ...: i['b'] = 'bbb'
    ...: i['a'] = 'aaa'
    ...: i['d'] = 'ddd'
    ...: i['c'] = 'ccc'
    ...: i.items()
    ...: 
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]

Upvotes: 6

Related Questions