SystematicFrank
SystematicFrank

Reputation: 17269

sorting algorithm to keep sort position numbers updated

Every once in a while I must deal with a list of elements that the user can sort manually.

In most cases I try to rely on a model using an order sensitive container, however this is not always possible and resort to adding a position field to my data. This position field is a double type, therefore I can always calculate a position between two numbers. However this is not ideal, because I am concerned about reaching an edge case where I do not have enough numerical precision to continue inserting between two numbers.

I am having doubts about the best approach to maintain my position numbers. The first thought is traversing all the rows and give them a round number after every insertion, like:

Right after dropping a row between 2 and 3:

1   2   2.5   3   4    5

After position numbers update:

1   2   3     4   5    6

That of course, might get heavy if I have a high number of entries. Not specially in memory, but to store all new values back to the disk/database. I usually work with some type of ORM and mobile software. Updating all the codes will pull out of disk every object and will set them as dirty, leading to a re-verification of all the related validation rules of my data model.

I could also wait until the precision is not enough to calculate a number between two positions. However the user experience would be bad, since the same operation will no longer require the same amount of time.

I believe that there is an standard algorithm for these cases that regularly and consistently keep the position numbers updated, or just some of them. Ideally it should be O(log n), with no big time differences between the worst and best cases.

Being honest I also think that anything that must be user/sorted, cannot grow as large as to become a real problem in its worst case. The edge case seems also to be extremely rare, even more if I search a solution pushing the border numbers. However I still believe that there is an standard well known solution for this problem which I am not aware of, and I would like to learn about it.

Upvotes: 6

Views: 2229

Answers (4)

SystematicFrank
SystematicFrank

Reputation: 17269

After some days with no valid answer. This is my theory:

The real challenge here is a practical solution. Maybe there is a mathematical correct solution, but every day that goes by, it seems that the implementation would be of a great complexity. A good solution should not only be mathematically correct, but also balanced with the nature the problem, the low chances to meet it, and its minor implications. Like how useless it could be killing flies with bullets, although extremely effective.

I am starting to believe that a good answer could be: to the hell with the right solution, leave it like one line calculation and live with the rare case where sorting of two elements might fail. It is not worth to increase complexity and invest time or money in such nity-picky problem, so rare, that causes no data damage, just a temporal UX glitch.

Upvotes: 0

Nicolas Repiquet
Nicolas Repiquet

Reputation: 9265

Second try.

Consider the full range of position values, say 0 -> 1000

The first item we insert should have a position of 500. Our list is now :

(0) -> 500 -> (1000).

If you insert another item at first position, we end up with :

(0) -> 250 -> 500 -> (1000).

If we keep inserting items at first position, we gonna have a problem, as our ranges are not equally balanced and... Wait... balanced ? Doesn't it sounds like a binary tree problem !?

Basically, you store your list as a binary tree. When inserting a node, you assign it a position according to surrounding nodes. When your tree become unbalanced, you rotate nodes to make it balanced again and you recompute position for rotated nodes !

So :

  • Most of the time, adding a node will not require to change position of other nodes.
  • When balancing is required, only a subset of your items will be changed.
  • It's O(log n) !

EDIT

algorithm explained

Upvotes: 4

Nicolas Repiquet
Nicolas Repiquet

Reputation: 9265

This not really answers the question but...

As you talked about "adding a position field to your data", I suppose that your data store is a relational database and that your data has some kind of identifier.

So maybe you can implement a doubly linked list by adding a previous_data_id and next_data_id to your data. Insert/move/remove operations thus are O(1).

Loading such a collection from a database is rather easy:

  • Fetch each item and add them to a map with their id as key.
  • For each item connect it with its previous and next item.
  • Starting with the first item (previous_data_id is undefined) follow the chain and add them to a list.

Upvotes: 0

Gareth Rees
Gareth Rees

Reputation: 65854

If the user is actually sorting the list manually, then is there really any need to worry about taking O(n) to record the new order? It's O(n) in any case just to display the list to the user.

Upvotes: 0

Related Questions