Reputation: 36307
I'm interested in keeping reference to the order of the field names in a scrapy item. where is this stored?
>>> dir(item)
Out[7]:
['_MutableMapping__marker',
'__abstractmethods__',
'__class__',
'__contains__',
'__delattr__',
'__delitem__',
'__dict__',
'__doc__',
'__eq__',
'__format__',
'__getattr__',
'__getattribute__',
'__getitem__',
'__hash__',
'__init__',
'__iter__',
'__len__',
'__metaclass__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__setitem__',
'__sizeof__',
'__slots__',
'__str__',
'__subclasshook__',
'__weakref__',
'_abc_cache',
'_abc_negative_cache',
'_abc_negative_cache_version',
'_abc_registry',
'_class',
'_values',
'clear',
'copy',
'fields',
'get',
'items',
'iteritems',
'iterkeys',
'itervalues',
'keys',
'pop',
'popitem',
'setdefault',
'update',
'values']
I tried item.keys(), but that returns an unordered dict
Upvotes: 1
Views: 2712
Reputation: 911
Item
class has a dict interface, storing the values in the _values
dict, which does not keep track of the key order (https://github.com/scrapy/scrapy/blob/1.5/scrapy/item.py#L53). I believe you could subclass from Item
and override the __init__
method to make that container an Ordereddict
:
from scrapy import Item
from collections import OrderedDict
class OrderedItem(Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs: # avoid creating dict for most common case
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
The item then preserves the order in which the values were assigned:
In [28]: class SomeItem(OrderedItem):
...: a = Field()
...: b = Field()
...: c = Field()
...: d = Field()
...:
...: i = SomeItem()
...: i['b'] = 'bbb'
...: i['a'] = 'aaa'
...: i['d'] = 'ddd'
...: i['c'] = 'ccc'
...: i.items()
...:
Out[28]: [('b', 'bbb'), ('a', 'aaa'), ('d', 'ddd'), ('c', 'ccc')]
Upvotes: 6