khinester
khinester

Reputation: 3530

sorting python dictionary list

I have this code:

from BeautifulSoup import BeautifulSoup

TABLE_CONTENT = [['958','<a id="958F" href="javascript:c_row(\'958\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'958\')" title="click date time to show origin_list (evid=958)">1945/07/16 11:29:45</a>','33.6753','-106.4747','','-.03','21','','','TRINITY','&nbsp;','&nbsp;','<a href="javascript:c_md(\'958\')" title="click here to show source data">SourceData</a>','&nbsp;'],['959','<a id="959F" href="javascript:c_row(\'959\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'959\')" title="click date time to show origin_list (evid=959)">1945/08/05 23:16:02</a>','34.395','132.4538','','-.58','15','','','LITTLEBOY','&nbsp;','&nbsp;','<a href="javascript:c_md(\'959\')" title="click here to show source data">SourceData</a>','&nbsp;']]

EVENT_LIST = []
for EVENT in TABLE_CONTENT:
    events = {}
    for index, item in enumerate(EVENT):
        if index == 0:
            events['id'] = item
        if index == 4:
            soup = BeautifulSoup(item)
            for a in soup.findAll('a'):
                events['date'] = ''.join(a.findAll(text=True))
        if index == 2:
            events['country'] = item
        if index == 3:
            events['type'] = item
        if index == 5:
            events['lat'] = item
        if index == 6:
            events['lon'] = item
        if index == 8:
            events['depth'] = item
        if index == 9:
            events['yield'] = item
        if index == 12:
            events['name'] = item
    sorted(events, key=lambda key: events['id'])
    EVENT_LIST.append(events)
    print '=== new record ==='
EVENT_LIST.sort(key=lambda x: x['id'])
print EVENT_LIST

the first issue, i have is that within the EVENT_LIST the dictionary objects are not in the same order as they have been added to the list, for example, the 'lat' and 'lon' when i print the results are not in order:

[{'name': 'TRINITY', 'country': 'USA', 'lon': '-106.4747', 'yield': '21', 'lat': '33.6753', 'depth': '-.03', 'date': u'1945/07/16 11:29:45', 'type': 'Atmospheric', 'id': '958'}, {'name': 'LITTLEBOY', 'country': 'USA', 'lon': '132.4538', 'yield': '15', 'lat': '34.395', 'depth': '-.58', 'date': u'1945/08/05 23:16:02', 'type': 'Atmospheric', 'id': '959'}]

also is there a better way to write this code?

Upvotes: 1

Views: 192

Answers (5)

Ruggero Turra
Ruggero Turra

Reputation: 17740

First some comments about your code:

  1. why do you call it events if it is inside a loop? It is only one event
  2. why do you reuse the events variable for different event? It can be dangerous for example if an event is bad formatted, for example without an item
  3. sorted is a no-op in your code, it has no side effect
  4. why do you use CAPITAL for non-constant variable?

The issue on the dictionay is not a real issue, it is a feature: the keys are ordered by their hash because dict is hash-based. If you really need to conserve the order you can use collections.OrderedDict

By the way, here an example:

import operator

event_list = []
for event in TABLE_CONTENT:
    event_dict = {}
    event_dict['id'] = event[0]
    event_dict['country'] = event[2]
    # ...
    event_dict['name'] = event[12]
    event_list.append(event_dict)
event_list = sorted(event_list, key = operator.itemgetter('id'))
print event_list

Upvotes: 0

hochl
hochl

Reputation: 12960

You can preserve the order of insertions into a dictionary by using an OrderedDict container. From the manual:

Return an instance of a dict subclass, supporting the usual dict methods. An OrderedDict is a dict that remembers the order that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.

This feature has only been around since version 2.7.

@Better way: You might change subsequent if index == ... to elif index == ... since, if the index is 2, it never can be 5. Or you could store index/key combinations and use those to store your items. Example (not tried):

combos={
        0: 'id',
        2: 'country',
        3: 'type',
        5: 'lat',
        6: 'lon',
        8: 'depth',
        9: 'yield',
        12: 'name' }

...

for index, item ...:
    if index == 4:
        soup = BeautifulSoup(item)
        for a in soup.findAll('a'):
        events['date'] = ''.join(a.findAll(text=True))
    elif index in combos:
        events[combox[index]]=item

I think you get the idea.

Upvotes: 1

Emmett Butler
Emmett Butler

Reputation: 6207

https://stackoverflow.com/a/526131/735204

Dictionaries are unordered by definition, as they're stored internally as hash tables. The lack of ordering is a consequence of the algorithm by which keys are inserted and removed from the hash table. Thus, you should never depend on a dictionary's keys being in any particular order. Maybe consider using a tuple instead, or a list of dictionaries - the latter will allow you to maintain a key:value format while also guaranteeing a reliable ordering.

If you're really set on using a dictionary, you might also want to look at OrderedDict, although IMHO if you're using a dict and requiring it to be ordered, you're thinking about the data the wrong way and there's probably a simpler way to do it. http://docs.python.org/library/collections.html#collections.OrderedDict

For the curious, this is a great presentation explaining exactly why it is that Python dictionaries have undefined orderings http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-the-mighty-dictionary-55-3352147

Upvotes: 1

MostafaR
MostafaR

Reputation: 3695

Better code for your conversion:

from BeautifulSoup import BeautifulSoup

HEADERS = ['id', None, 'country', 'type', 'date', 'lat', 'lon', None, 'depth', 'yield', None, None, 'name']
TABLE_CONTENT = [['958','<a id="958F" href="javascript:c_row(\'958\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'958\')" title="click date time to show origin_list (evid=958)">1945/07/16 11:29:45</a>','33.6753','-106.4747','','-.03','21','','','TRINITY','&nbsp;','&nbsp;','<a href="javascript:c_md(\'958\')" title="click here to show source data">SourceData</a>','&nbsp;'],['959','<a id="959F" href="javascript:c_row(\'959\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'959\')" title="click date time to show origin_list (evid=959)">1945/08/05 23:16:02</a>','34.395','132.4538','','-.58','15','','','LITTLEBOY','&nbsp;','&nbsp;','<a href="javascript:c_md(\'959\')" title="click here to show source data">SourceData</a>','&nbsp;']]

EVENT_LIST = []
for EVENT in TABLE_CONTENT:
    events = {}
    for index, item in enumerate(EVENT):
        if index != 4:
            events[HEADERS[index]] = item
        if index == 4:
            soup = BeautifulSoup(item)
            for a in soup.findAll('a'):
                events[HEADERS[index]] = ''.join(a.findAll(text=True))
    sorted(events, key=lambda key: events['id'])
    EVENT_LIST.append(events)
    print '=== new record ==='
EVENT_LIST.sort(key=lambda x: x['id'])
print EVENT_LIST

Upvotes: 0

Phil
Phil

Reputation: 7106

Dictionaries in Python are unordered by default.

You can use OrderedDict instead. Note that this is only available in Python 2.7+

Upvotes: 0

Related Questions