Reputation: 543

Python - Fast approach to merging two list of lists

I have two list of lists (containing integers), the first list is the "seed" list, the second is considered the "new" list. I want to grow the seed list with lists that are in the new list but not in the seed list, plus lists that are similar (not disjoint) to one another between the seed and new list by list(set(seedList[index1] + newList[index2])). The end result will be the seed list having grown from unique lists within the new list, and expanding similar lists between the seed and new lists. Wow, not sure if that makes sense.

Seed List [[1,2,3], [4,5], [6,7,8]]

New List [[1,2], [6,7], [9]]

Final List [[1,2,3], [4,5], [6,7,8], [9]]

** Edit **

I realized after posting this problem that I perhaps forced my self into a corner by seeking a solution that required a list of lists. The greater context is there are two files: seed and new. Each file has a logical grouping of IDs per row. I need to merge the two files but keep the logical grouping of each row. So if two rows are the same between the seed and new file (i.e. 1,2,3 == 1,2,3), leave the row in the seed file alone, if two rows are "similar" (not disjoint, i.e. 6,7,8 ~ 6,7)) then merge the two rows (no duplicates) and update the row in the seed file, if there is a unique row in the new file (not in the seed file, i.e. 9) add it to the seed file.

Seed File

1,2,3
4,5
6,7,8

New File

1,2
6,7
9

Final File

1,2,3
4,5
6,7,8
9

Upvotes: 0

Answers (3)

t.m.adam

Reputation: 15376

It 's a bit messy but it works

Seed_List = [[1,2,3], [4,5], [6,7,8]]
New_List = [[1,2], [6,7], [9]]

Seed_List += [ 
    nl for nl in New_List 
    if sum([ 1 for sl in Seed_List if all(l in sl for l in nl) ]) == 0 
]

Output :

print(Seed_List)
[[1, 2, 3], [4, 5], [6, 7, 8], [9]]

I hope it 's helpful

Upvotes: 1

Michael Hoff

Reputation: 6348

For huge amounts of data the following approach using an index map might be faster. If you want to merge multiple times, you might want to keep the index structure up-to-date (not sure if that makes a big difference, though).

seeds = list(map(set, [[1,2,3], [4,5], [6,7,8]]))
groups = list(map(set, [[1,2], [6,7], [9]]))

def merge(seeds, groups, idx=None):
    if idx is None:
        idx = dict((k, s) for s in seeds for k in s)
    mapped_values = set(idx.keys())
    for new_group in groups:
        common_values = new_group.intersection(mapped_values)
        if common_values:
            mapped_group = idx[common_values.pop()]
            # assert all(mapped_group == idx[v] for v in common_values)
            mapped_group.update(new_group)
        else:
            mapped_group = new_group
            seeds.append(new_group)
        #############################################
        # keep index up-to-date, required only if...
        # - we would want to reuse the index
        # - new groups are not pairwise disjoint
        #
        # for v in new_group:
        #     idx[v] = mapped_group
        # mapped_values.update(new_group)
        #############################################

print(seeds)
print(groups)
merge(seeds, groups)
print(seeds)

# [set([1, 2, 3]), set([4, 5]), set([8, 6, 7])]
# [set([1, 2]), set([6, 7]), set([9])]
# [set([1, 2, 3]), set([4, 5]), set([8, 6, 7]), set([9])]

Condensed version, assuming all preconditions and 'optimized' for one merge only:

def merge(seeds, groups):
    idx = dict((k, s) for s in seeds for k in s)
    mapped_values = set(idx.keys())
    for new_group in groups:
        common_values = new_group.intersection(mapped_values)
        if common_values:
            idx[common_values.pop()].update(new_group)
        else:
            seeds.append(new_group)

Upvotes: 1

binu.py

Reputation: 1216

seed = [[1,2,3], [4,5], [6,7,8]]

new = [[1,2], [6,7], [9]]

flat_seed = [i for j in seed for i in j]
flat_new = [i for j in new for i in j]

seed.append([])

for i in flat_new:
    if i not in flat_seed:
        seed[-1].append(i)
        flat_seed.append(i)

print seed

Upvotes: 0

Python - Fast approach to merging two list of lists

Answers (3)

Related Questions