user2467545
user2467545

Reputation:

Efficiently find an element in multiple sorted lists?

Problem Statement:-

I was ask this interview question recently.. I was able to come up with the below code only which runs in O(k log n)-

Given k <= n sorted arrays each of size n, there exists a data structure requiring O(kn) preprocessing time and memory that answers iterated search queries in O(k + log n) time.

I have k sorted Lists, each of size n. Currently I have hard coded 5 sorted Lists each of size 3 but in general that can be very high number-

I would like to search for single element in each of the k Lists.

Obviously, I can binary search each array individually, which will result in O(k log n) where k is number of sorted arrays.

Can we do it in O(k + log n) where k is the number of sorted arrays? As I think there might be some better way of doing it as we're doing the same searches k times as of now -

private List<List<Integer>> dataInput;

public SearchItem(final List<List<Integer>> inputs) {
    dataInput = new ArrayList<List<Integer>>();
    for (List<Integer> input : inputs) {
        dataInput.add(new ArrayList<Integer>(input));
    }
}

public List<Integer> getItem(final Integer x) {
    List<Integer> outputs = new ArrayList<Integer>();
    for (List<Integer> data : dataInput) {
        int i = Collections.binarySearch(data, x); // binary searching the item
        if (i < 0)
            i = -(i + 1);
        outputs.add(i == data.size() ? null : data.get(i));
    }
    return outputs;
}

public static void main(String[] args) {
    List<List<Integer>> lists = new ArrayList<List<Integer>>();

    List<Integer> list1 = new ArrayList<Integer>(Arrays.asList(3, 4, 6));
    List<Integer> list2 = new ArrayList<Integer>(Arrays.asList(1, 2, 3));
    List<Integer> list3 = new ArrayList<Integer>(Arrays.asList(2, 3, 6));
    List<Integer> list4 = new ArrayList<Integer>(Arrays.asList(1, 2, 3));
    List<Integer> list5 = new ArrayList<Integer>(Arrays.asList(4, 8, 13));

    lists.add(list1);
    lists.add(list2);
    lists.add(list3);
    lists.add(list4);
    lists.add(list5);

    SearchItem search = new SearchItem(lists);
    System.out.println(dataInput);

    List<Integer> dataOuput = search.getItem(5);

    System.out.println(dataOuput);
}

Whatever output I am seeing with my above code approach should come with the new approach as well which should work in O(k + log n).

Is this possible to achieve? Can anyone provide an example how would this work basis on my example?

Upvotes: 5

Views: 761

Answers (3)

Niklas B.
Niklas B.

Reputation: 95308

The technique is called Fractional cascading which sounds very cool. What you do is the following:

  1. Take list 1. Take every second element of it and merge it into list 2. Now "the new" list 2 contains all its elements and half of the ones from list 1. You remember which ones are from list 1 and pointers back to list 1 and then you pass through the newly created list 2 from front to back, adding for every element a pointer to the last element from list 1 that you saw and to the last element from list 2 that you saw. Do the same from back to front.
  2. Take the "new" list 2 with half of list 1's elements embedded and merge it with list 3 etc.

The resulting interleaving will look something like this:

fractional cascading

(Source: "You could have invented fractional cascading" by Edward Z. Yang)

and every list element will have a couple of pointers to find predecessors/successors of a certain kind fast and to find the position in the list i - 1.

Turns out the total number of list elements is only increased by a constant factor, but the cool thing is that you can now do queries fast:

  1. Do a binary search in "new" list k to find your search element. Complexity: O(log n). You now found the element in the original list k, because you can find in O(1) the surrounding elements that were originally in list k.
  2. You can also find the position of the element in list k - 1 in O(1) because you have pointers to the successor/predecessor in list k - 1. So you can report the result for all the other lists in O(1) each

Total runtime: O(log n + k)

For more information, you should definitely read the blog post, it has lots of visualizing illustrations and additional explanations.

Upvotes: 4

ksun
ksun

Reputation: 1427

Someone else has probably answered this by now (I haven't refreshed the page). But here is a method for merging the list that should work in O(hn). I haven't actually tested the syntax in an editor, but I think the idea should work...

After calling this method you should just be able to do a binary search on the merged list.

public static List<Integer> mergeSortedLists(List<List<Integer>> sortedLists){
  List<Integer> mergedList = new List<Integer>();
  int listIndexes[] = new int[sortedLists.size];
  //initialize indexes to 0
  for(int i=0; i<sortedLists.Count(); i++){
    listIndex[i] = 0;
  }  
  int completedLists=0;
  int lowestValue;
  int lowestIndex;
  while(completedLists < sortedLists.Count()){  
    lowestValue = sortedLists[0][listIndexes[0]];
    lowestIndex = 0;
    for(int i=0; i<sortedLists.Count(); i++){      
      int currentIndex = listIndexes[i];      
      List<Integer> currentList = sortedLists[i];
      if(currentIndex >= currentList) continue; //already finished merging this list skip
      int currentValue = currentList[currentIndex];
      if(currentValue < lowestValue){
         lowestValue = currentValue;
         lowestIndex = currentIndex;
      }
    }
    //put the lowest found value into mergedList and increment index
    mergedList.Add(lowestValue);
    listIndexes[lowestIndex]++;
    //if incremented index is equal to increment completed Lists - when all lists are marked
    //complete the while loop will be broken out of and merge should be complete
    if(listIndexes[lowestIndex] == sortedLists[lowestIndex].Count()){
        completedLists++;   
    }
  }
  return mergedList;
}

Upvotes: 0

Alex Suo
Alex Suo

Reputation: 3119

Since your arrays are sorted, the elements are comparable. Using a B-tree structure, and make sure the arrays doesn't have overlapping segments i.e. each array are sorted and any item inside is either

item < first for all the other arrays; or item > last for all the other arrays.

Then O(k + logn) is achieved by comparing the searching item so that first < search item < last; then do a log(n) search inside.

But essentially this can be O(logk + logn).

Upvotes: 0

Related Questions