Reputation: 15364
I have a list of items that should be inserted in a list-like data structure one after the other, and I have the indexes at which each item should be inserted. For example:
items = ['itemX', 'itemY', 'itemZ']
indexes = [0, 0, 1]
The expected result is to have a list like this: result = ['itemY', 'itemZ', 'itemX']
.
I'm able to get this result with this simple approach:
result = []
for index, item in zip(indexes, items):
result.insert(index, item)
However, this is a very slow approach once lists become huge (the complexity is O(n^2)). Is there any (relatively simple to implement) way to improve my basic approach? I guess I have to look at other data structures while I insert elements and finally transform that data structure into my result
list. Are trees a good option? Insert could be done maybe in O(log(n)) (instead of O(n)), but which specific tree-like structure should I use?
Or maybe something good can be achieved by just looking at all the indexes together (instead of using them one by one).
This is probably the worst case for my slow approach (always insert items at the beginning of the list):
n = 10**6 # some large number
items = list(range(n))
indexes = [0] * n
Upvotes: 7
Views: 1530
Reputation: 10231
You could use SortedList
, neutralize its sorting with a constant key function, and only use it for its fast insertions. Needs version 1.5.10 or older, as insert
got removed.
def insertions(indexes, items):
tmp = SortedList(key=lambda _: 0)
for index, item in zip(indexes, items):
tmp.insert(index, item)
return list(tmp)
(I imagine there's also something like it but without sorting that needs to be neutralized, sortedcontainers
is just something I know.)
Benchmark results:
indexes = [0] * 10**6 [randint(0, i) for i in range(10**6)]
--------------------------------------------------------------------------------
original 1540 seconds 759 seconds
neutralized SortedList 13 seconds 31 seconds
sorted mediants 201 seconds 249 seconds
sorted mediants optimized 42 seconds 72 seconds
Those last two solutions are another idea:
Use a SortedList
the normal way, but annotate each item with a fraction from 0 to 1 (and order by that). To insert between two items, use those items' mediant.
from sortedcontainers import SortedList
from fractions import Fraction
def insertions(indexes, items):
xs = SortedList([(Fraction(0), None), (Fraction(1), None)])
for index, item in zip(indexes, items):
a, c = xs[index][0].as_integer_ratio()
b, d = xs[index + 1][0].as_integer_ratio()
xs.add((Fraction(a+b, c+d), item))
return [item for _, item in xs[1:-1]]
Optimized version doing fractions myself:
from sortedcontainers import SortedList
class X(tuple):
def __lt__(self, other):
return self[0] * other[1] < self[1] * other[0]
def insertions(indexes, items):
xs = SortedList([X((0, 1, None)), X((1, 1, None))])
for index, item in zip(indexes, items):
L, R = xs[index : index+2]
xs.add(X((L[0] + R[0], L[1] + R[1], item)))
return [x[2] for x in xs[1:-1]]
Upvotes: 1
Reputation: 23955
Here's python code for a treap with a size decoration that allows insertion at specific indexes, and reordering of whole contiguous sections. It was adapted from C++ code, Kimiyuki Onaka's solution to the Hackerrank problem, "Give Me the Order." (I cannot guarantee that this adaptation is bug free -- a copy of the original code is available in the description of this question.)
import random
class Treap:
def __init__(self, value=None):
self.value = value
self.key = random.random()
self.size = 1
self.left = None
self.right = None
def size(t):
return t.size if t else 0
def update(t):
if t:
t.size = 1 + size(t.left) + size(t.right)
return t
def merge(a, b):
if not a:
return b
if not b:
return a
if a.key > b.key:
a.right = merge(a.right, b)
return update(a)
else:
b.left = merge(a, b.left)
return update(b)
def split(t, i):
if not t:
return None, None
if i <= size(t.left):
u, t.left = split(t.left, i)
return u, update(t)
else:
t.right, u = split(t.right, i - size(t.left) - 1)
return update(t), u
def insert(t, i, value):
left, right = split(t, i)
u = Treap(value)
return merge(merge(left, u), right)
def inorder(treap):
if not treap:
return
if treap.left:
inorder(treap.left)
print(treap.value)
if treap.right:
inorder(treap.right)
Output:
lst = ['itemX', 'itemY', 'itemZ']
idxs = [0, 0, 1]
t = None
for i in range(len(lst)):
t = insert(t, idxs[i], lst[i])
inorder(t)
"""
itemY
itemZ
itemX
"""
Upvotes: 1