Qubix
Qubix

Reputation: 4353

Concatenate lists and remove overlaps

For two lists I want

A = [ 1,2,3,4,5]
B = [4,5,6,7]

result C = [1,2,3,4,5,6,7]

if I specify an overlap of 2.

Code so far:

concat_list = []
word_overlap = 2

for lst in [lst1, lst2, lst3]:
  if (len(concat_list) != 0):

    if  (concat_list[-word_overlap:] != lst[:word_overlap]):
        concat_list += lst
    elif ([concat_list[-word_overlap:]] == lst[:word_overlap]): 

        raise SystemExit

  else:
    concat_list += lst

doing it for lists of strings, but should be the same thing.

EDIT:

What I want my code to do is, first, check if there is any overlap (of 1, of 2, etc), then concatenate lists, eliminating the overlap (so I don't get double elements).

[1,2,3,4,5] + [4,5,6,7] = [1,2,3,4,5,6,7]

but

[1,2,3] + [4,5,6] = [1,2,3,4,5,6]

I want it to also check for any overlap smaller than my set word_overlap.

Upvotes: 0

Views: 938

Answers (4)

Kruupös
Kruupös

Reputation: 5474

You can use set and union

s.union(t): new set with elements from both s and t

>> list(set(A) | set(B))
[1, 2, 3, 4, 5, 6, 7]

But you can't have the exact number you need to overlap this way.

To answer you question, you will have to ruse and use a combination of sets:

  • get a new list with elements from both A and B
  • get new list with elements common to A and B
  • get only the number of elements you need in this list using slicing

  • get new list with elements in either A or B but not both

    OVERLAP = 1
    
    A = [1, 2, 3, 4, 5]
    B = [4, 5, 6, 7]
    
    C = list(set(A) | set(B)) # [1, 2, 3, 4, 5, 6, 7]
    D = list(set(A) & set(B)) # [4, 5]
    D = D[OVERLAP:] # [5]
    
    
    print list(set(C) ^ set(D)) # [1, 2, 3, 4, 6, 7]
    

just for fun, a one-liner could give this:

list((set(A) | set(B)) ^ set(list(set(A) & set(B))[OVERLAP:])) # [1, 2, 3, 4, 6, 7]

Where OVERLAP is the constant where you need you reunion.

Upvotes: 1

Siddhant
Siddhant

Reputation: 76

assuming that both lists will be consecutive, and list a will always have smaller values than list b. I come up with this solution. This will also help you detect overlap.

def concatenate_list(a,b):
    max_a = a[len(a)-1]
    min_b = b[0]
    if max_a >= min_b:
        print 'overlap exists'
        b = b[(max_a - min_b) + 1:]
    else:
        print 'no overlap'
    return a + b

For strings you can do this also

def concatenate_list_strings(a,b):
    count = 0
    for i in xrange(min(len(a),len(b))):
        max_a = a[len(a) - 1 - count:]
        min_b = b[0:count+1]

        if max_a == min_b:
            b = b[count +1:]
            return 'overlap count ' + str(count), a+b
        count += 1
    return a + b

Upvotes: 0

Yann Vernier
Yann Vernier

Reputation: 15877

Here's a naïve variant:

def concat_nooverlap(a,b):
    maxoverlap=min(len(a),len(b))
    for overlap in range(maxoverlap,-1,-1):
        # Check for longest possible overlap first
        if a[-overlap:]==b[:overlap]:
            break  # Found an overlap, don't check any shorter
    return a+b[overlap:]

It would be more efficient with types that support slicing by reference, such as buffers or numpy arrays.

One quite odd thing this does is, upon reaching overlap=0, it compares the entirety of a (sliced, which is a copy for a list) with an empty slice of b. That comparison will fail unless they were empty, but it still leaves overlap=0, so the return value is correct. We can handle this case specifically with a slight rewrite:

def concat_nooverlap(a,b):
    maxoverlap=min(len(a),len(b))
    for overlap in range(maxoverlap,0,-1):
        # Check for longest possible overlap first
        if a[-overlap:]==b[:overlap]:
            return a+b[overlap:]
    else:
        return a+b

Upvotes: 1

Dries De Rydt
Dries De Rydt

Reputation: 808

Not sure if I correctly interpreted your question, but you could do it like this:

A = [ 1,2,3,4,5]
B = [4,5,6,7]

overlap = 2

print A[0:-overlap] + B

If you want to make sure they have the same value, your check could be along the lines of:

if(A[-overlap:] == B[:overlap]):
   print A[0:-overlap] + B
else:
   print "error"

Upvotes: 0

Related Questions