akkie
akkie

Reputation: 2543

Sort a list by an ordered index

Let us assume that I have the following two sequences:

val index = Seq(2,5,1,4,7,6,3)
val unsorted = Seq(7,6,5,4,3,2,1)

The first is the index by which the second should be sorted. My current solution is to traverse over the index and construct a new sequence with the found elements from the unsorted sequence.

val sorted  = index.foldLeft(Seq[Int]()) { (s, num) => 
  s ++ Seq(unsorted.find(_ == num).get)
}

But this solution seems very inefficient and error-prone to me. On every iteration it searches the complete unsorted sequence. And if the index and the unsorted list aren't in sync, then either an error will be thrown or an element will be omitted. In both cases, the not in sync elements should be appended to the ordered sequence.

Is there a more efficient and solid solution for this problem? Or is there a sort algorithm which fits into this paradigm?


Note: This is a constructed example. In reality I would like to sort a list of mongodb documents by an ordered list of document Id's.


Update 1

I've selected the answer from Marius Danila because it seems the more fastest and scala-ish solution for my problem. It doesn't come with a not in sync item solution, but this could be easily implemented.

So here is the updated solution:

def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = {
  val positionMapping = HashMap(index.zipWithIndex: _*)
  val inSync = new Array[T](unsorted.size)
  val notInSync = new ArrayBuffer[T]()
  for (item <- unsorted) {
    if (positionMapping.contains(key(item))) {
      inSync(positionMapping(key(item))) = item
    } else {
      notInSync.append(item)
    }
  }

  inSync.filterNot(_ == null) ++ notInSync
}

Update 2

The approach suggested by Bask.cc seems the correct answer. It also doesn't consider the not in sync issue, but this can also be easily implemented.

val index: Seq[String]
val entities: Seq[Foo]
val idToEntityMap = entities.map(e => e.id -> e).toMap
val sorted = index.map(idToEntityMap)
val result = sorted ++ entities.filterNot(sorted.toSet)

Upvotes: 8

Views: 3693

Answers (7)

Cory Klein
Cory Klein

Reputation: 55640

This may not exactly map to your use case, but Googlers may find this useful:

scala> val ids = List(3, 1, 0, 2)
ids: List[Int] = List(3, 1, 0, 2)

scala> val unsorted = List("third", "second", "fourth", "first")
unsorted: List[String] = List(third, second, fourth, first)

scala> val sorted = ids map unsorted
sorted: List[String] = List(first, second, third, fourth)

Upvotes: 3

Bask.ws
Bask.ws

Reputation: 843

Why do you want to sort collection, when you already have sorted index collection? You can just use map

Concerning> In reality I would like to sort a list of mongodb documents by an ordered list of document Id's.

val ids: Seq[String]
val entities: Seq[Foo]
val idToEntityMap = entities.map(e => e.id -> e).toMap

ids.map(idToEntityMap _)

Upvotes: 4

James_pic
James_pic

Reputation: 3307

The best I can do is to create a Map from the unsorted data, and use map lookups (basically the hashtable suggested by a previous poster). The code looks like:

val unsortedAsMap = unsorted.map(x => x -> x).toMap
index.map(unsortedAsMap)

Or, if there's a possibility of hash misses:

val unsortedAsMap = unsorted.map(x => x -> x).toMap
index.flatMap(unsortedAsMap.get)

It's O(n) in time*, but you're swapping time for space, as it uses O(n) space.

For a slightly more sophisticated version, that handles missing values, try:

import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer

val unsortedAsMap = new java.util.LinkedHashMap[Int, Int]
for (i <- unsorted) unsortedAsMap.add(i, i)

val newBuffer = ListBuffer.empty[Int]
for (i <- index) {
  val r = unsortedAsMap.remove(i)
  if (r != null) newBuffer += i
  // Not sure what to do for "else"
}

for ((k, v) <- unsortedAsMap) newBuffer += v

newBuffer.result()

If it's a MongoDB database in the first place, you might be better retrieving documents directly from the database by index, so something like:

index.map(lookupInDB)

*technically it's O(n log n), as Scala's standard immutable map is O(log n), but you could always use a mutable map, which is O(1)

Upvotes: 1

Marius Danila
Marius Danila

Reputation: 10411

Ok.

Let's start from the beginning. Besides the fact you're rescanning the unsorted list each time, the Seq object will create, by default a List collection. So in the foldLeft you're appending an element at the end of the list each time and this is a O(N^2) operation.

An improvement would be

val sorted_rev  = index.foldLeft(Seq[Int]()) { (s, num) => 
  unsorted.find(_ == num).get +: s
}
val sorted = sorted_rev.reverse

But that is still an O(N^2) algorithm. We can do better.

The following sort function should work:

def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = {
  val positionMapping = HashMap(index.zipWithIndex: _*) //1
  val arr = new Array[T](unsorted.size) //2
  for (item <- unsorted) { //3
    val position = positionMapping(key(item))
    arr(position) = item
  }
  arr //6
}

The function sorts a list of items unsorted by a sequence of indexes index where the key function will be used to extract the id from the objects you're trying to sort.

Line 1 creates a reverse index - mapping each object id to its final position.

Line 2 allocates the array which will hold the sorted sequence. We're using an array since we need constant-time random-position set performance.

The loop that starts at line 3 will traverse the sequence of unsorted items and place each item in it's meant position using the positionMapping reverse index

Line 6 will return the array converted implicitly to a Seq using the WrappedArray wrapper.

Since our reverse-index is an immutable HashMap, lookup should take constant-time for regular cases. Building the actual reverse-index takes O(N_Index) time where N_Index is the size of the index sequence. Traversing the unsorted sequence takes O(N_Unsorted) time where N_Unsorted is the size of the unsorted sequence.

So the complexity is O(max(N_Index, N_Unsorted)), which I guess is the best you can do in the circumstances.

For your particular example, you would call the function like so:

val sorted = sort(index, unsorted, identity[Int])

For the real case, it would probably be like this:

val sorted = sort(idList, unsorted, obj => obj.id)

Upvotes: 1

Lev Khomich
Lev Khomich

Reputation: 2247

In this case you can use zip-sort-unzip:

(unsorted zip index).sortWith(_._2 < _._2).unzip._1

Btw, if you can, better solution would be to sort list on db side using $orderBy.

Upvotes: 1

Noah
Noah

Reputation: 13959

Flat Mapping the index over the unsorted list seems to be a safer version (if the index isn't found it's just dropped since find returns a None):

index.flatMap(i => unsorted.find(_ == i))

It still has to traverse the unsorted list every time (worst case this is O(n^2)). With you're example I'm not sure that there's a more efficient solution.

Upvotes: 1

sushil
sushil

Reputation: 165

I do not know the language that you are using. But irrespective of the language this is how i would have solved the problem.

From the first list (here 'index') create a hash table taking key as the document id and the value as the position of the document in the sorted order.

Now when traversing through the list of document i would lookup the hash table using the document id and then get the position it should be in the sorted order. Then i would use this obtained order to sort in a pre allocated memory.

Note: if the number of documents is small then instead of using hashtable u could use a pre allocated table and index it directly using the document id.

Upvotes: 1

Related Questions