Ciasto piekarz
Ciasto piekarz

Reputation: 8277

Most efficient way of inserting in between a list?

I am reading a file and building a list a2. and I want to insert 3 lines to list a2 from list b after first two items .

b = ["This is a line", "another line", "and another one"]
a2 = ['a1', 'a2', 'a3']

i = 0
for x, y in map(None, a2[0:2], a2):
    i = i + 1
    if x == y:
        continue
    else:
        for newLine in b:
            a2.insert(i-1, newLine)
            i = i+1
print a2

The above does give me expected result like ['a1', 'a2', 'This is a line', 'another line', 'and another one', 'a3'] but since I am going to build list out of huge text file and insert few lines in between I am thinking I have to make it more performance intuitive!

Upvotes: 5

Views: 2116

Answers (3)

skyking
skyking

Reputation: 14359

If you really want to do what you're telling in the question. The fastest solution (if the array you're inserting into grows large) would be to use a custom container class instead. It has been pointed out that reversed list would be faster, but reversing the list each time you insert an element (and reversing it again after) is also costly. Something like this:

class ReverseList:
    def __init__(self, *args, **kwds):
        self.revlist = list(*args, **kwds)
        self.revlist.reverse()

    def __getitem__(self, key):
        # if you need slicing you need to improve this:
        return self.revlist[-key] 

    def __setitem__(self, key, val):
        # if you need slicing you need to improve this:
        return self.revlist[-key] = val

    def insert(self, pos, val):
        self.revlist.insert(-pos, val)

    # etc

Upvotes: 0

gsb-eng
gsb-eng

Reputation: 1209

I'm giving this answer in a belief that your list a2 is not a fixed size at the beginning and you've to insert all the values of list b into list a2 after index 1.

Generally list.insert() works in this way, if list l1 size is n (imagine that this n is hege) and if you try to add another huge list l2 of values from the beginning lets say from position 2 l1.insert(2, val) this should move all the other elements of list l1 from 2 to n - 1 to next positions for every insert.

We can avoid this by inserting from the ending by reversing both the l1 and l2.

Lets consider your lists l1 , l2 and we've to insert all the l2 values into l1 from the index 2.

>>> l1 = range(1, 10)
>>> l2 = range(10, 20)
>>> l1
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l2
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

After inserting the l2 into l1 with the below way.......

>>> i = 2
>>> for j in l2:
...     l1.insert(i, j)
...     i += 1
>>> l1
[1, 2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 3, 4, 5, 6, 7, 8, 9]

In the above way of inserting for every insert the values 3, 4, 5, 6, 7, 8, 9 in l1 has been moved to next positions after proper list.resize. Assume that what will heppen if your 'l1' size is of ten million this movement of values becomes a overhead.

To avoid this data movement within the list for every insert, you can insert values from the end, in your case you've to reverse the list l1 and l2 and do l1.insert(-2, l2.val)

>>> l1
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l2
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> l1.reverse()
>>> l2.reverse()
>>> l1
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> l2
[19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
>>> for i in l2:
...     l1.insert(-2, i)
... 

After inserting you'll get this ..

>>> l1
[9, 8, 7, 6, 5, 4, 3, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 2, 1]

If you observe the data movement happened in this way of inserting, only values 2, 1 has been moved all the time while inserting the values of l2.

You can simply reverse the l1 to get the desired list of values.

>>> l1.reverse()
>>> l1
[1, 2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 3, 4, 5, 6, 7, 8, 9]

in this way we can avoid the most happening data movement in list.insert().

Time Stats: https://ideone.com/owzWza

Note: This whole solution works well in your case but in case if you've insert some value at the middle of the list you've to think again for another best solution.

Upvotes: 0

Anand S Kumar
Anand S Kumar

Reputation: 90909

How about -

a2[2:2] = b

Demo -

>>> b = ["This is a line", "another line", "and another one"]
>>> a2 = ['a1', 'a2', 'a3']
>>> a2[2:2] = b
>>> a2
['a1', 'a2', 'This is a line', 'another line', 'and another one', 'a3']

Timing Information on some of the methods I know of (including the one posted by OP) -

def func1():
    b = ["This is a line", "another line", "and another one"]
    a2 = ['a1', 'a2', 'a3']
    i = 0
    for x, y in map(None, a2[0:2], a2):
        i = i + 1
        if x == y:
            continue
        else:
            for newLine in b:
                a2.insert(i-1, newLine)
                i = i+1
    return a2


def func2():
    b = ["This is a line", "another line", "and another one"]
    a2 = ['a1', 'a2', 'a3']
    a2 = a2[:2] + b + a2[2:]
    return a2

def func3():
    b = ["This is a line", "another line", "and another one"]
    a2 = ['a1', 'a2', 'a3']
    a2[2:2] = b
    return a2


import timeit

print timeit.timeit(func1,number=500000)
print timeit.timeit(func2,number=500000)
print timeit.timeit(func3,number=500000)

Result -

1.81288409233
0.621006011963
0.341125011444

Timing results of a having 100000 elements and b having 1000 elements -

def func1():
    global a2
    global b
    i = 0
    for x, y in map(None, a2[0:2], a2):
        i = i + 1
        if x == y:
            continue
        else:
            for newLine in b:
                a2.insert(i-1, newLine)
                i = i+1
            break
    return a2


def func2():
    global a2
    global b
    a2 = a2[:2] + b + a2[2:]
    return a2

def func3():
    global a2
    global b
    a2[2:2] = b
    return a2

def func4():
    global a2
    global b
    a2.reverse()
    b.reverse()
    for i in b:
        a2.insert(-2, i)
    return a2

import timeit

a2 = ['a1' for _ in range(100000)]
b = ['a2' for i in range(1000)]

print timeit.timeit(func1,number=10,setup = 'from __main__ import a2,b')
print timeit.timeit(func2,number=10,setup = 'from __main__ import a2,b')
print timeit.timeit(func3,number=10,setup = 'from __main__ import a2,b')
print timeit.timeit(func4,number=10,setup = 'from __main__ import a2,b')

Result -

1.00535297394
0.0210499763489
0.001296043396
0.0044310092926

Reference to the timing test - https://ideone.com/k4DANI

Upvotes: 4

Related Questions