Reputation: 543
I have two list of lists (containing integers), the first list is the "seed" list, the second is considered the "new" list. I want to grow the seed list with lists that are in the new list but not in the seed list, plus lists that are similar (not disjoint) to one another between the seed and new list by list(set(seedList[index1] + newList[index2]))
. The end result will be the seed list having grown from unique lists within the new list, and expanding similar lists between the seed and new lists. Wow, not sure if that makes sense.
Seed List
[[1,2,3], [4,5], [6,7,8]]
New List
[[1,2], [6,7], [9]]
Final List
[[1,2,3], [4,5], [6,7,8], [9]]
** Edit **
I realized after posting this problem that I perhaps forced my self into a corner by seeking a solution that required a list of lists. The greater context is there are two files: seed and new. Each file has a logical grouping of IDs per row. I need to merge the two files but keep the logical grouping of each row. So if two rows are the same between the seed and new file (i.e. 1,2,3 == 1,2,3), leave the row in the seed file alone, if two rows are "similar" (not disjoint, i.e. 6,7,8 ~ 6,7)) then merge the two rows (no duplicates) and update the row in the seed file, if there is a unique row in the new file (not in the seed file, i.e. 9) add it to the seed file.
Seed File
1,2,3
4,5
6,7,8
New File
1,2
6,7
9
Final File
1,2,3
4,5
6,7,8
9
Upvotes: 0
Views: 135
Reputation: 15376
It 's a bit messy but it works
Seed_List = [[1,2,3], [4,5], [6,7,8]]
New_List = [[1,2], [6,7], [9]]
Seed_List += [
nl for nl in New_List
if sum([ 1 for sl in Seed_List if all(l in sl for l in nl) ]) == 0
]
Output :
print(Seed_List)
[[1, 2, 3], [4, 5], [6, 7, 8], [9]]
I hope it 's helpful
Upvotes: 1
Reputation: 6348
For huge amounts of data the following approach using an index map might be faster. If you want to merge
multiple times, you might want to keep the index structure up-to-date (not sure if that makes a big difference, though).
seeds = list(map(set, [[1,2,3], [4,5], [6,7,8]]))
groups = list(map(set, [[1,2], [6,7], [9]]))
def merge(seeds, groups, idx=None):
if idx is None:
idx = dict((k, s) for s in seeds for k in s)
mapped_values = set(idx.keys())
for new_group in groups:
common_values = new_group.intersection(mapped_values)
if common_values:
mapped_group = idx[common_values.pop()]
# assert all(mapped_group == idx[v] for v in common_values)
mapped_group.update(new_group)
else:
mapped_group = new_group
seeds.append(new_group)
#############################################
# keep index up-to-date, required only if...
# - we would want to reuse the index
# - new groups are not pairwise disjoint
#
# for v in new_group:
# idx[v] = mapped_group
# mapped_values.update(new_group)
#############################################
print(seeds)
print(groups)
merge(seeds, groups)
print(seeds)
# [set([1, 2, 3]), set([4, 5]), set([8, 6, 7])]
# [set([1, 2]), set([6, 7]), set([9])]
# [set([1, 2, 3]), set([4, 5]), set([8, 6, 7]), set([9])]
Condensed version, assuming all preconditions and 'optimized' for one merge only:
def merge(seeds, groups):
idx = dict((k, s) for s in seeds for k in s)
mapped_values = set(idx.keys())
for new_group in groups:
common_values = new_group.intersection(mapped_values)
if common_values:
idx[common_values.pop()].update(new_group)
else:
seeds.append(new_group)
Upvotes: 1
Reputation: 1216
seed = [[1,2,3], [4,5], [6,7,8]]
new = [[1,2], [6,7], [9]]
flat_seed = [i for j in seed for i in j]
flat_new = [i for j in new for i in j]
seed.append([])
for i in flat_new:
if i not in flat_seed:
seed[-1].append(i)
flat_seed.append(i)
print seed
Upvotes: 0