Reputation: 3530
I have this code:
from BeautifulSoup import BeautifulSoup
TABLE_CONTENT = [['958','<a id="958F" href="javascript:c_row(\'958\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'958\')" title="click date time to show origin_list (evid=958)">1945/07/16 11:29:45</a>','33.6753','-106.4747','','-.03','21','','','TRINITY',' ',' ','<a href="javascript:c_md(\'958\')" title="click here to show source data">SourceData</a>',' '],['959','<a id="959F" href="javascript:c_row(\'959\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'959\')" title="click date time to show origin_list (evid=959)">1945/08/05 23:16:02</a>','34.395','132.4538','','-.58','15','','','LITTLEBOY',' ',' ','<a href="javascript:c_md(\'959\')" title="click here to show source data">SourceData</a>',' ']]
EVENT_LIST = []
for EVENT in TABLE_CONTENT:
events = {}
for index, item in enumerate(EVENT):
if index == 0:
events['id'] = item
if index == 4:
soup = BeautifulSoup(item)
for a in soup.findAll('a'):
events['date'] = ''.join(a.findAll(text=True))
if index == 2:
events['country'] = item
if index == 3:
events['type'] = item
if index == 5:
events['lat'] = item
if index == 6:
events['lon'] = item
if index == 8:
events['depth'] = item
if index == 9:
events['yield'] = item
if index == 12:
events['name'] = item
sorted(events, key=lambda key: events['id'])
EVENT_LIST.append(events)
print '=== new record ==='
EVENT_LIST.sort(key=lambda x: x['id'])
print EVENT_LIST
the first issue, i have is that within the EVENT_LIST the dictionary objects are not in the same order as they have been added to the list, for example, the 'lat' and 'lon' when i print the results are not in order:
[{'name': 'TRINITY', 'country': 'USA', 'lon': '-106.4747', 'yield': '21', 'lat': '33.6753', 'depth': '-.03', 'date': u'1945/07/16 11:29:45', 'type': 'Atmospheric', 'id': '958'}, {'name': 'LITTLEBOY', 'country': 'USA', 'lon': '132.4538', 'yield': '15', 'lat': '34.395', 'depth': '-.58', 'date': u'1945/08/05 23:16:02', 'type': 'Atmospheric', 'id': '959'}]
also is there a better way to write this code?
Upvotes: 1
Views: 192
Reputation: 17740
First some comments about your code:
events
if it is inside a loop? It is only one event
events
variable for different event? It can be dangerous for example if an event is bad formatted, for example without an itemsorted
is a no-op in your code, it has no side effectThe issue on the dictionay is not a real issue, it is a feature: the keys are ordered by their hash because dict
is hash-based. If you really need to conserve the order you can use collections.OrderedDict
By the way, here an example:
import operator
event_list = []
for event in TABLE_CONTENT:
event_dict = {}
event_dict['id'] = event[0]
event_dict['country'] = event[2]
# ...
event_dict['name'] = event[12]
event_list.append(event_dict)
event_list = sorted(event_list, key = operator.itemgetter('id'))
print event_list
Upvotes: 0
Reputation: 12960
You can preserve the order of insertions into a dictionary by using an OrderedDict container. From the manual:
Return an instance of a dict subclass, supporting the usual dict methods. An OrderedDict is a dict that remembers the order that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.
This feature has only been around since version 2.7.
@Better way: You might change subsequent if index == ...
to elif index == ...
since, if the index is 2, it never can be 5. Or you could store index/key combinations and use those to store your items. Example (not tried):
combos={
0: 'id',
2: 'country',
3: 'type',
5: 'lat',
6: 'lon',
8: 'depth',
9: 'yield',
12: 'name' }
...
for index, item ...:
if index == 4:
soup = BeautifulSoup(item)
for a in soup.findAll('a'):
events['date'] = ''.join(a.findAll(text=True))
elif index in combos:
events[combox[index]]=item
I think you get the idea.
Upvotes: 1
Reputation: 6207
https://stackoverflow.com/a/526131/735204
Dictionaries are unordered by definition, as they're stored internally as hash tables. The lack of ordering is a consequence of the algorithm by which keys are inserted and removed from the hash table. Thus, you should never depend on a dictionary's keys being in any particular order. Maybe consider using a tuple instead, or a list of dictionaries - the latter will allow you to maintain a key:value format while also guaranteeing a reliable ordering.
If you're really set on using a dictionary, you might also want to look at OrderedDict, although IMHO if you're using a dict and requiring it to be ordered, you're thinking about the data the wrong way and there's probably a simpler way to do it. http://docs.python.org/library/collections.html#collections.OrderedDict
For the curious, this is a great presentation explaining exactly why it is that Python dictionaries have undefined orderings http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-the-mighty-dictionary-55-3352147
Upvotes: 1
Reputation: 3695
Better code for your conversion:
from BeautifulSoup import BeautifulSoup
HEADERS = ['id', None, 'country', 'type', 'date', 'lat', 'lon', None, 'depth', 'yield', None, None, 'name']
TABLE_CONTENT = [['958','<a id="958F" href="javascript:c_row(\'958\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'958\')" title="click date time to show origin_list (evid=958)">1945/07/16 11:29:45</a>','33.6753','-106.4747','','-.03','21','','','TRINITY',' ',' ','<a href="javascript:c_md(\'958\')" title="click here to show source data">SourceData</a>',' '],['959','<a id="959F" href="javascript:c_row(\'959\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'959\')" title="click date time to show origin_list (evid=959)">1945/08/05 23:16:02</a>','34.395','132.4538','','-.58','15','','','LITTLEBOY',' ',' ','<a href="javascript:c_md(\'959\')" title="click here to show source data">SourceData</a>',' ']]
EVENT_LIST = []
for EVENT in TABLE_CONTENT:
events = {}
for index, item in enumerate(EVENT):
if index != 4:
events[HEADERS[index]] = item
if index == 4:
soup = BeautifulSoup(item)
for a in soup.findAll('a'):
events[HEADERS[index]] = ''.join(a.findAll(text=True))
sorted(events, key=lambda key: events['id'])
EVENT_LIST.append(events)
print '=== new record ==='
EVENT_LIST.sort(key=lambda x: x['id'])
print EVENT_LIST
Upvotes: 0
Reputation: 7106
Dictionaries in Python are unordered by default.
You can use OrderedDict instead. Note that this is only available in Python 2.7+
Upvotes: 0