Sean Anderson
Sean Anderson

Reputation: 29331

Algorithm to allow re-ordering of objects while only requiring update to a constant number of object positions

I have a large collection of objects I wish to keep in order based on one of their properties.

As an example, lets assume an object could potentially look like:

var myObject = {
    id: 'c_1',
    position: 0 
}

A naive implementation of an ordered collection would look like:

[{
    id: 'c_1',
    position: 0 
}, {
    id: 'c_2',
    position: 1
}, {
    id: 'c_3',
    position: 2 
}, {
    id: 'c_4',
    position: 3 
}, {
    id: 'c_5',
    position: 4 
}]

The reason this is a naive implementation is because if 'c_4' wishes to be moved between 'c_1' and 'c_2' then the positions of 'c_2', 'c_3', 'c_4', and 'c_5' must all be updated to accommodate. This means that, on average, N/2 + 1 elements will be affected for any given reorder of the array of objects.

The solution I have come up with presently is less naive, but has its own set of flaws.

Instead of positioning each element directly next to each other (i.e. 0, 1, 2, 3, 4), I leave a 'sufficiently large' gap between each element:

[{
    id: 'c_1',
    position: 10000 
}, {
    id: 'c_2',
    position: 20000
}, {
    id: 'c_3',
    position: 30000 
}, {
    id: 'c_4',
    position: 40000 
}, {
    id: 'c_5',
    position: 50000
}]

Now, if 'c_4' wishes to be moved between 'c_1' and 'c_2' its position is set to the mid-point of its two proposed new neighbors. 'c_4' would have a position of 15000 and no other elements would need to be affected.

This solution starts to break down after a sufficiently large number of re-orders (log(10000) = 13), though, and requires re-indexing the array once this scenario is encountered.

I'm wondering if there is another, more elegant solution to my dilemma?

As I'm typing this, I realize I could stop expecting position to be an integer and allow it to become a double which would allow for a near-infinite number of positions between any two given elements. Perhaps that's the right call, but it's really just a re-hashing of my non-naive implementation.

Upvotes: 2

Views: 342

Answers (3)

joshtkling
joshtkling

Reputation: 3538

This is exactly the problem that balanced binary search trees are designed to solve. The main downside: you'll have O(logn) for random access to your data rather than O(1), but insertions and removals will also take O(logn), which is a vast improvement over your current O(n) solution.

To change the position of an element:

  1. Remove the element from the BST O(logn)
  2. Mutate the element value O(1)
  3. Re-insert the element into the BST O(logn)

Since both of these operations are O(logn), your ultimate runtime for changing the value of any element is O(logn) as well.

This isn't quite the constant number of modifications that you were looking for, but I'm not familiar with an algorithm that can provide you that. This also requires only O(n) memory usage.

Upvotes: 1

Ambroz Bizjak
Ambroz Bizjak

Reputation: 8105

As was mentioned in joshtkling's answer, self balancing binary search trees will help here. However, I will show you a special form of such trees that might interest you.

Usually, to establish a binary search tree, you select some attribute of the elements which defines the ordering. But suppose that we do not have any such attribute, we just want to be able to insert nodes into particular places. For example, in an abstract sense we want these operations, with no worse than O(log n) efficiency:

  • insertBefore(existing_node, new_node)
  • insertAfter(existing_node, new_node)
  • insertAt(index, new_node)
  • getIndex(existing_node)

Here, the index of a node in the tree is defined to be the number of nodes in the tree which precede it.

Such a data structure can be achieved by a binary search tree with the following special properties:

  1. In each node of the tree, you store the number of nodes in the entire subtree defined by the node. You update these in all operations that modify the tree, which is possible without an asymptotic degradation of performance. These node counts allow an efficient implementation of the insertAt and getIndex operations.

  2. You use the index of each node (as computed by getIndex) as the key which defines the ordering of the tree. This may look like a circular definition problem, but it can be made to work with appropriate implementation of the insertion and removal code. For example, the insert wouldn't actually call getIndex on various nodes as it descends down the tree, but would use the node counts seen so far, similarly to how getIndex operates.

I have written an open source implementation in C, available here. While the implementation is complex due to the internal usage of a very flexible AVL tree, it is easy to use. The external interface is similar in nature to an intrusive linked list. You can find an example program here.

Upvotes: 0

jamesplease
jamesplease

Reputation: 12879

These are the steps of the problem

  1. The initial state of your data, a list with indices that matter
  2. A change to the index of an item in the list
  3. The operation to update all of the items in the list
  4. The final state of the data

There are at least 3 ways to sync #1 and #4 between the client and the server, which I'll outline below. Note that I will not compare the computational intensity of the three options. by the sound of it, processing speed isn't your primary concern, but rather bandwidth is. With that in mind, here are the three choices:

Method One: Send Over The Final State

The first method, which is the most obvious one, and the one you're having issues with, is to update the server by passing along #4. As your lists get long this becomes quite a payload to send over.

Pros

  • #3 is performed a single time
  • It works well with the most common HTTP verbs, like POST
  • It's the most obvious; it's easy to understand what's going on

Cons

  • As the list grows the data you send over the wire grows quite large, as you pointed out

Method Two: Send The Changes Only

The second option reduces bandwidth to an absolute minimum. To do this you can pass #2 over the wire and perform #3 on the server. This would duplicate the operation (I imagine you'd still want to perform it on the client, rather than wait for the server's response) but keep bandwidth usage to a min.

This might look like this following:

{
  oldIndex: 2134
  newIndex: 54
}

This gives the server everything it needs to do #3 and get to #4. This is easily expanded to n operations by creating a list of operations that are performed in order, resulting in n objects in your update.

Pros

  • Bandwidth is the smallest it can possibly be

Cons

  • Duplicates the operation logic on the client and server, which isn't very DRY
  • Server does more computation
  • It's proprietary, and not as simple an API endpoint as the first method

Method Three: JSONPatch

The third option is to use the JSON Patch spec to generate a patch to send to the server. A JSON patch for a single move operation yields n+1 objects, where n = Math.abs(oldIndex - newIndex).

For instance,

given the initial array

[a, b, c, d, e, f, g, h, i, j]

and the operation

{
  oldIndex: 5
  newIndex: 2
}

The above algorithm shows that the JSONPatch will have 5-2+1, or 4 objects. These objects are:

[
  {"op":"replace","path":"/5","value":"e"},
  {"op":"replace","path":"/4","value":"d"},
  {"op":"replace","path":"/3","value":"c"},
  {"op":"replace","path":"/2","value":"f"}
]

If you choose to go with JSONPatch I suggest you use a library like JSON-Patch to handle it.

Pros:

  • Handles more than just moves
  • No library handles all of the edge cases (I have several open issues on JSON-Patch about cases where it fails)

Cons:

  • More bandwidth than just passing the operation itself
  • Potentially more computation
  • Requires an extra library
  • Depends on the HTTP PATCH verb

Upvotes: 3

Related Questions