handloomweaver
handloomweaver

Reputation: 5011

How can I iterate over a list of dictionaries and merge dictionaries to form new shorter list of dicts?

I have a list of airline flight fares that have a 'price', a 'tickettype', indicates if the fare is 'oneway' (as opposed to roundtrip and that map to another journeys list by an integer code. But the list I receive is duplicated.

[
{'price' : 1800, 'oneway' : 1, 'inboundJourneys' : [], "outboundJourneys": [3], 'tickettypecode' : 'SDS'},
{'price' : 1800, 'oneway' : 1, 'inboundJourneys' : [9,10,11], "outboundJourneys": [], 'tickettypecode' : 'SDS'},
{'price' : 1800, 'oneway' : 1, 'inboundJourneys' : [14,16], "outboundJourneys": [], 'tickettypecode' : 'SDS'},
{'price' : '2300', 'oneway' : 1, 'inboundJourneys' : [], "outboundJourneys": [6,8,9], 'tickettypecode' : 'TAR'},
{'price' : 2300, 'oneway' : 1, 'inboundJourneys' : [12,13,14], "outboundJourneys": [3], 'tickettypecode' : 'TAR'},
{'price' : 900, 'oneway' : 1, 'inboundJourneys' : [], "outboundJourneys": [18,19,20], 'tickettypecode' : 'GED'},
{'price' : 900, 'oneway' : 1, 'inboundJourneys' : [14,16,17], "outboundJourneys": [], 'tickettypecode' : 'GED'},
{'price' : 1200, 'oneway' : 1, 'inboundJourneys' : [], "outboundJourneys": [25], 'tickettypecode' : 'ABC'},
{'price' : 1200, 'oneway' : 1, 'inboundJourneys' : [32], "outboundJourneys": [], 'tickettypecode' : 'ABC'}
]

What I need is:

Where 'price' is equal and 'tickettypecode' is equal and 'oneway' is equal there is one dictionary in the list so ending up with:

[
{'price' : 1800, 'oneway' : 1, 'inboundJourneys' : [9,10,11,14,16], "outboundJourneys": [3], 'tickettypecode' : 'SDS'},
{'price' : 2300, 'oneway' : 1, 'inboundJourneys' : [12,13,14], "outboundJourneys": ['6,8,9'], 'tickettypecode' : 'TAR'},
{'price' : 900, 'oneway' : 1, 'inboundJourneys' : [14,16,17], "outboundJourneys": [18,19,20], 'tickettypecode' : 'GED'},
{'price' : 1200, 'oneway' : 1, 'inboundJourneys' : [32], "outboundJourneys": [25], 'tickettypecode' : 'ABC'}
]

I've tried a lot of approaches but I'm stumped.

Upvotes: 2

Views: 488

Answers (4)

senderle
senderle

Reputation: 151007

In general, situations like this are best handled by dictionaries. For example:

l = [(1, 2, 3), (1, 2, 8), (2, 3, 9), (5, 6, 66),  
     (3, 4, 22), (4, 5, 24), (5, 6, 55), (3, 4, 11)]

Here we have a list of tuples. Now say we want two tuples to be "equal" iff the first two values in the tuple are equal, and we want to consolidate the later values. We can use tuples as dictionary keys; so for each tuple, we generate a key tuple like so. I'll define a function for clarity's sake here:

def get_key(tup):
    return tup[0:2]

This slices the tuple, returning a tuple with the first two values. For such a simple operation, a function might seem like overkill, but for more complicated operations, it makes things much clearer.

I'll also define a function that returns the extra data:

def get_extra(tup):
    return tup[2]

Now we create a dictionary:

consolidated_tuples = {}

and populate it:

for tup in l:
    key = get_key(tup)
    extra = get_extra(tup)
    if key not in consolidated_tuples:
        consolidated_tuples[key] = [extra]
    else:
        consolidated_tuples[key].append(extra)

This simply checks to see if the key is in the dictionary. If it's not, then it creates a list containing the last value in the tuple, and assigns that list to the key. If it is, then it appends the last value in the given tuple to the list (which already exists). This way, duplicates are consolidated; tuples that generate the same key lead to the same list, which is then populated with the various trailing values.

You can easily extend this approach to work with a list of dictionaries; it just gets a bit more complicated.

From this basic code, we can add some sophistication. For example, dictionaries have a setdefault method, which attempts to access a key in a dictionary, and if it cannot, creates it, assigns to it a default value, and returns that default value. That means that the above if... else statement can be compacted:

for tup in l:
    consolidated_tuples.setdefault(get_key(tup), []).append(get_extra(tup))

An equivalent approach is to use defaultdict, which does the same thing as above behind the scenes:

import collections
consolidated_tuples = collections.defaultdict(list)

Every time a nonexistent key is accessed, defaultdict calls list, associates the result with key, and returns the resulting empty list.

for tup in l:
    consolidated_tuples[get_key(tup)].append(get_extra(tup))

All you have to do now is rewrite get_key and get_extra to work with the data above.

>>> def get_key(d):
...     return (int(d['price']), d['oneway'], d['tickettypecode'])
... 
>>> def get_extra(d):
...     return (d['outboundJourneys'], d['inboundJourneys'])
... 
>>> merged_data = collections.defaultdict(list)
>>> for d in data:
...     merged_data[get_key(d)].append(get_extra(d))

The result can be easily transformed to resemble the initial structure; if you want to include 'price' and so on in the dictionaries, simply add them in the below step:

>>> for k in merged_data:
...     ob, ib = zip(*merged_data[k])
...     merged_data[k] = {'outboundJourneys': [x for l in ob for x in l],
...                       'inboundJourneys': [x for l in ib for x in l]}
... 
>>> merged_data
defaultdict(<type 'list'>, {
    (2300, 1, 'TAR'): 
        {'outboundJourneys': [6, 8, 9, 3], 'inboundJourneys': [12, 13, 14]}, 
    (1200, 1, 'ABC'): {'outboundJourneys': [25], 'inboundJourneys': [32]}, 
    (1800, 1, 'SDS'): 
        {'outboundJourneys': [3], 'inboundJourneys': [9, 10, 11, 14, 16]}, 
    (900, 1, 'GED'): 
        {'outboundJourneys': [18, 19, 20], 'inboundJourneys': [14, 16, 17]}
})

You could also write a function that, instead of simply appending the extra data to a list, would merge it in a more sophisticated way. In that case, defaultdict probably adds a bit of unnecessary complication; we can just use dict.get(key, default), which searches for a key and returns a default value if not found. Putting it all together, customized for the data above (here named flights):

def merge_dict(d1, d2, key_names):
    merged_d = d1.copy()
    merged_d.update(d2)
    merged_d.update((k, d1.get(k, []) + d2.get(k, [])) for k in key_names)
    return merged_d

merged = {}        
for d in flights:
    key = (int(d['price']), d['tickettypecode'], d['oneway'])
    cd = merged.get(key, {})
    merged[key] = merge_dict(cd, d, ('inboundJourneys', 'outboundJourneys'))

Result:

>>> consolidated_flights
{(1200, 'ABC', 1): {'inboundJourneys': [32], 'price': 1200, 
     'outboundJourneys': [25], 'oneway': 1, 'tickettypecode': 'ABC'}, 
 (2300, 'TAR', 1): {'inboundJourneys': [12, 13, 14], 'price': 2300, 
     'outboundJourneys': [6, 8, 9, 3], 'oneway': 1, 'tickettypecode': 'TAR'}, 
 (1800, 'SDS', 1): {'inboundJourneys': [9, 10, 11, 14, 16], 'price': 1800, 
     'outboundJourneys': [3], 'oneway': 1, 'tickettypecode': 'SDS'}, 
 (900, 'GED', 1): {'inboundJourneys': [14, 16, 17], 'price': 900, 
     'outboundJourneys': [18, 19, 20], 'oneway': 1, 'tickettypecode': 'GED'}}

Upvotes: 0

Rik Poggi
Rik Poggi

Reputation: 29302

I would do it like this:

import copy

def merge(iterable, keys, update):
    merged = {}
    for d in iterable:
        merge_key = tuple(d[k] for k in keys)
        m = merged.get(merge_key)
        if m:
            for u in update:
                m[u].extend(d[u])
        else:
            merged[merge_key] = copy.deepcopy(d)

    return list(merged.values())  # list(dict_view)

I've test it on your exampe:

keys = ('price','tickettypecode','oneway')
update = ('inboundJourneys','outboundJourneys')
merge(l, keys, update)

And I got:

[{'inboundJourneys': [32],
  'oneway': 1,
  'outboundJourneys': [25],
  'price': 1200,
  'tickettypecode': 'ABC'},
 {'inboundJourneys': [12, 13, 14],
  'oneway': 1,
  'outboundJourneys': [6, 8, 9, 3],
  'price': 2300,
  'tickettypecode': 'TAR'},
 {'inboundJourneys': [9, 10, 11, 14, 16],
  'oneway': 1,
  'outboundJourneys': [3],
  'price': 1800,
  'tickettypecode': 'SDS'},
 {'inboundJourneys': [14, 16, 17],
  'oneway': 1,
  'outboundJourneys': [18, 19, 20],
  'price': 900,
  'tickettypecode': 'GED'}]

Upvotes: 0

inspectorG4dget
inspectorG4dget

Reputation: 113975

Horribly inefficient solution, but a starting point:

answer = []
for myDict in myList:
    for d in answer:
        if d['oneway']==myDict['oneway'] and d['price']==myDict['price'] and d['tickettype']==myDict['tickettype']:
            break
    else:
        answer.append(myDict)

Hope this helps

Upvotes: 0

Francis Avila
Francis Avila

Reputation: 31631

Assuming order of items in the merged list does not matter, simply go through each item in the list and copy it if you haven't seen it before or merge the fields if you have.

merged = {}

for item in original:
    key = (item['price'], item['tickettypecode'], item['oneway'])
    if key in merged:
        for mergekey in ['inboundJourneys','outboundJourneys']:
            # assign extended copy rather than using list.extend()
            merged[key][mergekey] = merged[key][mergekey] + item[mergekey]
    else:
        merged[key] = item.copy()

mergedlist = merged.values()

Upvotes: 3

Related Questions