Riccardo Bucco
Riccardo Bucco

Reputation: 15364

Efficiently insert multiple elements in a list (or another data structure) keeping their order

I have a list of items that should be inserted in a list-like data structure one after the other, and I have the indexes at which each item should be inserted. For example:

items = ['itemX', 'itemY', 'itemZ']
indexes = [0, 0, 1]

The expected result is to have a list like this: result = ['itemY', 'itemZ', 'itemX'].

I'm able to get this result with this simple approach:

result = []
for index, item in zip(indexes, items):
    result.insert(index, item)

However, this is a very slow approach once lists become huge (the complexity is O(n^2)). Is there any (relatively simple to implement) way to improve my basic approach? I guess I have to look at other data structures while I insert elements and finally transform that data structure into my result list. Are trees a good option? Insert could be done maybe in O(log(n)) (instead of O(n)), but which specific tree-like structure should I use?

Or maybe something good can be achieved by just looking at all the indexes together (instead of using them one by one).

This is probably the worst case for my slow approach (always insert items at the beginning of the list):

n = 10**6 # some large number
items = list(range(n))
indexes = [0] * n

Upvotes: 7

Views: 1530

Answers (2)

no comment
no comment

Reputation: 10231

You could use SortedList, neutralize its sorting with a constant key function, and only use it for its fast insertions. Needs version 1.5.10 or older, as insert got removed.

def insertions(indexes, items):
    tmp = SortedList(key=lambda _: 0)
    for index, item in zip(indexes, items):
        tmp.insert(index, item)
    return list(tmp)

(I imagine there's also something like it but without sorting that needs to be neutralized, sortedcontainers is just something I know.)

Benchmark results:

               indexes =   [0] * 10**6     [randint(0, i) for i in range(10**6)]
--------------------------------------------------------------------------------
original                  1540 seconds     759 seconds
neutralized SortedList      13 seconds      31 seconds
sorted mediants            201 seconds     249 seconds
sorted mediants optimized   42 seconds      72 seconds

Those last two solutions are another idea:

Use a SortedList the normal way, but annotate each item with a fraction from 0 to 1 (and order by that). To insert between two items, use those items' mediant.

from sortedcontainers import SortedList
from fractions import Fraction

def insertions(indexes, items):
    xs = SortedList([(Fraction(0), None), (Fraction(1), None)])
    for index, item in zip(indexes, items):
        a, c = xs[index][0].as_integer_ratio()
        b, d = xs[index + 1][0].as_integer_ratio()
        xs.add((Fraction(a+b, c+d), item))
    return [item for _, item in xs[1:-1]]

Optimized version doing fractions myself:

from sortedcontainers import SortedList

class X(tuple):
    def __lt__(self, other):
        return self[0] * other[1] < self[1] * other[0]

def insertions(indexes, items):
    xs = SortedList([X((0, 1, None)), X((1, 1, None))])
    for index, item in zip(indexes, items):
        L, R = xs[index : index+2]
        xs.add(X((L[0] + R[0], L[1] + R[1], item)))
    return [x[2] for x in xs[1:-1]]

Upvotes: 1

גלעד ברקן
גלעד ברקן

Reputation: 23955

Here's python code for a treap with a size decoration that allows insertion at specific indexes, and reordering of whole contiguous sections. It was adapted from C++ code, Kimiyuki Onaka's solution to the Hackerrank problem, "Give Me the Order." (I cannot guarantee that this adaptation is bug free -- a copy of the original code is available in the description of this question.)

import random

class Treap:
  def __init__(self, value=None):
    self.value = value
    self.key = random.random()
    self.size = 1
    self.left = None
    self.right = None


def size(t):
  return t.size if t else 0


def update(t):
  if t:
    t.size = 1 + size(t.left) + size(t.right)
  return t


def merge(a, b):
  if not a:
    return b
  if not b:
    return a

  if a.key > b.key:
    a.right = merge(a.right, b)
    return update(a)
  else:
    b.left = merge(a, b.left)
    return update(b)


def split(t, i):
  if not t:
    return None, None

  if i <= size(t.left):
    u, t.left = split(t.left, i)
    return u, update(t)
  else:
    t.right, u = split(t.right, i - size(t.left) - 1)
    return update(t), u


def insert(t, i, value):
  left, right = split(t, i)
  u = Treap(value)
  return merge(merge(left, u), right)


def inorder(treap):
  if not treap:
    return

  if treap.left:
    inorder(treap.left)

  print(treap.value)

  if treap.right:
    inorder(treap.right)

Output:

lst = ['itemX', 'itemY', 'itemZ']
idxs = [0, 0, 1]

t = None

for i in range(len(lst)):
  t = insert(t, idxs[i], lst[i])

inorder(t)

"""
itemY
itemZ
itemX
"""

Upvotes: 1

Related Questions