Reputation: 99

Writing a faster implementation for insert() in python

Essentially, I need to write a faster implementation as a replacement for insert() to insert an element in a particular position in a list. The inputs are given in a list as [(index, value), (index, value), (index, value)]

For example: Doing this to insert 10,000 elements in a 1,000,000 element list takes about 2.7 seconds

def do_insertions_simple(l, insertions):
    """Performs the insertions specified into l.
    @param l: list in which to do the insertions.  Is is not modified.
    @param insertions: list of pairs (i, x), indicating that x should
        be inserted at position i.
    """
    r = list(l)
    for i, x in insertions:
        r.insert(i, x)
    return r

My assignment asks me to speed up the time taken to complete the insertions by 8x or more

My current implementation:

def do_insertions_fast(l, insertions):
    """Implement here a faster version of do_insertions_simple """
    #insert insertions[x][i] at l[i]
    result=list(l)
    for x,y in insertions:
      result = result[:x]+list(y)+result[x:]
    return result

Sample input:

import string
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
insertions = [(0, 'a'), (2, 'b'), (2, 'b'), (7, 'c')]
r1 = do_insertions_simple(l, insertions)
r2 = do_insertions_fast(l, insertions)
print("r1:", r1)
print("r2:", r2)
assert_equal(r1, r2)

is_correct = False
for _ in range(20):
    l, insertions = generate_testing_case(list_len=100, num_insertions=20)
    r1 = do_insertions_simple(l, insertions)
    r2 = do_insertions_fast(l, insertions)
    assert_equal(r1, r2)
    is_correct = True

The error I'm getting while running the above code:

r1: ['a', 0, 'b', 'b', 1, 2, 3, 'c', 4, 5, 6, 7, 8, 9]
r2: ['a', 0, 'b', 'b', 1, 2, 3, 'c', 4, 5, 6, 7, 8, 9]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-54e0c44a8801> in <module>()
     12     l, insertions = generate_testing_case(list_len=100, num_insertions=20)
     13     r1 = do_insertions_simple(l, insertions)
---> 14     r2 = do_insertions_fast(l, insertions)
     15     assert_equal(r1, r2)
     16     is_correct = True

<ipython-input-7-b421ee7cc58f> in do_insertions_fast(l, insertions)
      4     result=list(l)
      5     for x,y in insertions:
----> 6       result = result[:x]+list(y)+result[x:]
      7     return result
      8     #raise NotImplementedError()

TypeError: 'float' object is not iterable

The file is using the nose framework to check my answers, etc, so if there's any functions that you don't recognize, its probably from that framework.

I know that it is inserting the lists right, however it keeps raising the error "float object is not iterable"

I've also tried a different method which did work (sliced the lists, added the element, and added the rest of the list, and then updating the list) but that was 10 times slower than insert()

I'm not sure how to continue

edit: I've been looking at the entire question wrong, for now I'll try to do it myself but if I'm stuck again I'll ask a different question and link that here

Upvotes: 3

Answers (3)

kaya3

Reputation: 51063

One option is to use a different data structure, which supports faster insertions.

The obvious suggestion would be a binary tree of some sort. You can insert nodes into a balanced binary tree in O(log n) time, so long as you're able to find the right insertion point in O(log n) time. A solution to that is for each node to store and maintain its own subtree's cardinality; then you can find a node by index without iterating through the whole tree. Another possibility is a skip list, which supports insertion in O(log n) average time.

However, the problem is that you are writing in Python, so you have a major disadvantage trying to write something faster than the built-in list.insert method, because that's implemented in C, and Python code is a lot slower than C code. It's not unusual to write an O(log n) algorithm in Python that only beats the built-in O(n) implementation for very large n, and even n = 1,000,000 may not be large enough to win by a factor of 8 or more. This could mean a lot of wasted effort if you try implementing your own data structure and it turns out not to be fast enough.

I think the expected solution for this assignment will be something like Heap Overflow's answer. That said, there is another way to approach this question which is worth considering because it avoids the complications of working out the correct indices to insert at if you do the insertions out of order. My idea is to take advantage of the efficiency of list.insert but to call it on shorter lists.

If the data is still stored in Python lists, then the list.insert method can still be used to get the efficiency of a C implementation, but if the lists are shorter then the insert method will be faster. Since you only need to win by a constant factor, you can divide the input list into, say, 256 sublists of roughly equal size. Then for each insertion, insert it at the correct index in the correct sublist; and finally join the sublists back together again. The time complexity is O(nm), which is the same as the "naive" solution, but it has a lower constant factor.

To compute the correct insertion index we need to subtract the lengths of the sublists to the left of the one we're inserting in; we can store the cumulative sublist lengths in a prefix sum array, and update this array efficiently using numpy. Here's my implementation:

from itertools import islice, chain, accumulate
import numpy as np

def do_insertions_split(lst, insertions, num_sublists=256):
    n = len(lst)
    sublist_len = n // num_sublists
    lst_iter = iter(lst)
    sublists = [list(islice(lst_iter, sublist_len)) for i in range(num_sublists-1)]
    sublists.append(list(lst_iter))
    lens = [0]
    lens.extend(accumulate(len(s) for s in sublists))
    lens = np.array(lens)

    for idx, val in insertions:
        # could use binary search, but num_sublists is small
        j = np.argmax(lens >= idx)
        sublists[j-1].insert(idx - lens[j-1], val)
        lens[j:] += 1

    return list(chain.from_iterable(sublists))

It is not as fast as @iz_'s implementation (linked from the comments), but it beats the simple algorithm by a factor of almost 20, which is sufficient according to the problem statement. The times below were measured using timeit on a list of length 1,000,000 with 10,000 insertions.

simple -> 2.1252768037122087 seconds
iz -> 0.041302349785668824 seconds
split -> 0.10893724981304054 seconds

Note that my solution still loses to @iz_'s by a factor of about 2.5. However, @iz_'s solution requires the insertion points to be sorted, whereas mine works even when they are unsorted:

lst = list(range(1_000_000))
insertions = [(randint(0, len(lst)), "x") for _ in range(10_000)]

# uncomment if the insertion points should be sorted
# insertions.sort()

r1 = do_insertions_simple(lst, insertions)
r2 = do_insertions_iz(lst, insertions)
r3 = do_insertions_split(lst, insertions)

if r1 != r2: print('iz failed') # prints
if r1 != r3: print('split failed') # doesn't print

Here is my timing code, in case anyone else wants to compare. I tried a few different values for num_sublists; anything between 200 and 1,000 seemed to be about equally good.

from timeit import timeit

algorithms = {
    'simple': do_insertions_simple,
    'iz': do_insertions_iz,
    'split': do_insertions_split,
}
reps = 10

for name, func in algorithms.items():
    t = timeit(lambda: func(lst, insertions), number=reps) / reps
    print(name, '->', t, 'seconds')

Upvotes: 2

Kelly Bundy

Reputation: 27609

From your question, emphasis mine:

I need to write a faster implementation as a replacement for insert() to insert an element in a particular position in a list

You won't be able to. If there was a faster way, then the existing insert() function would already use it. Anything you do will not even get close to the speed.

What you can do is write a faster way to do multiple insertions.

Let's look at an example with two insertions:

>>> a = list(range(15))
>>> a.insert(5, 'X')
>>> a.insert(10, 'Y')
>>> a
[0, 1, 2, 3, 4, 'X', 5, 6, 7, 8, 'Y', 9, 10, 11, 12, 13, 14]

Since every insert shifts all values to the right of it, this in general is an O(m*(n+m)) time algorithm, where n is the original size of the list and m is the number of insertions.

Another way to do it is to build the result piece by piece, taking the insertion points into account:

>>> a = list(range(15))
>>> b = []
>>> b.extend(a[:5])
>>> b.append('X')
>>> b.extend(a[5:9])
>>> b.append('Y')
>>> b.extend(a[9:])
>>> b
[0, 1, 2, 3, 4, 'X', 5, 6, 7, 8, 'Y', 9, 10, 11, 12, 13, 14]

This is O(n+m) time, as all values are just copied once and there's no shifting. It's just somewhat tricky to determine the correct piece lengths, as earlier insertions affect later ones. Especially if the insertion indexes aren't sorted (and in that case it would also take O(m log m) additional time to sort them). That's why I had to use [5:9] and a[9:] instead of [5:10] and a[10:]

(Yes, I know, extend/append internally copy some more if the capacity is exhausted, but if you understand things enough to point that out, then you also understand that it doesn't matter :-)

Upvotes: 3

Aaron

Reputation: 11075

list(y) attempts to iterate over y and create a list of its elements. If y is an integer, it will not be iterable, and return the error you mentioned. You instead probably want to create a list literal containing y like so: [y]

Upvotes: 1

Writing a faster implementation for insert() in python

Answers (3)

Related Questions