nTuply
nTuply

Reputation: 1374

Is there something faster than Collections.sort() in Java?

I made a median filter algorithm and I want to optimize it. Currently it's taking around 1 second to filter 2MM lines(a file read into an ArrayList elements) and I am trying to reduce it to less(maybe half the time?) I'm using ArrayLists for my algorithm and minimized the use of nested loops as well to avoid an increase in time, however I still can't achieve lower than 0.98 seconds tops.

Here's a code snippet that does the median filter:

//Start Filter Algorithm 2
        int index=0;
        while(index<filterSize){
            tempElements.add(this.elements.get(index+counter)); //Add element to a temporary arraylist
            index+=1;
            if(index==filterSize){
                outputElements.add(tempElements.get((filterSize-1)/2)); //Add median Value to output ArrayList
                tempElements.clear(); //Clear temporary ArrayList
                index = 0; //Reset index
                counter+=1; //Counter increments by 1 to move to start on next element in elements ArrayList                    
            }
            if(elementsSize-counter <filterSize){
                break; //Break if there is not enough elements for the filtering to work
            }
        }

What's happening is that I'm looping through the elements arraylist for the filterSize I provided. Then I add the elements to a temporary(tempElements) arraylist, sort it using Collections.sort()(this is what I want to avoid), find the median value and add it to my final output arraylist. Then I clear the tempElements arraylist and keep going through my loop until I cannot filter anymore due to the lack of elements(less than filterSize).

I'm just looking for a way to optimize it and get it faster. I tried to use a TreeSet but I cannot get the value at an index from it.

Thanks

Upvotes: 3

Views: 4200

Answers (2)

Patrick87
Patrick87

Reputation: 28332

As an alternative to Giovanni Botta's excellent answer:

Suppose that you have an array [7, 3, 8, 4, 6, 6, 2, 4, 6] and a filterSize of 4. Then our first temp array will be [7, 3, 8, 4] and we can sort it to get [3, 4, 7, 8]. When we compute our next temporary array, we can do it in linear (or better?) time as follows:

  1. remove 7
  2. insert 6 in sorted position

We can repeat this for all temp arrays after the initial sort. If you're spending a lot of time sorting subarrays, this might not be a bad way to go. The trick is that it increases required storage since you need to remember the order in which to remove the entries, but that shouldn't be a big deal (I wouldn't think).

Upvotes: 2

Giovanni Botta
Giovanni Botta

Reputation: 9816

The Java Collections.sort() implementation is as fast as it gets when it comes to sorting (dual pivot quick sort).

The problem here is not in the nitty gritty details but the fact that you are sorting at all! You only need to find the median and there are linear algorithms for that (sorting is log-linear). See selection for some inspiration. You might need to code it yourself since I don't think the java library has any public implementation available.

The other thing I suggest is to use a fixed size array (created once) instead of an ArrayList. Since you know the size of the filter beforehand that will give you a small speed boost.

Also I don't see how avoiding for loops helps performance in any way. Unless you profiled it and proved that it's the right thing to do, I would just write the most readable code possible.

Finally, TreeSet or any other kind of sorted data structure won't help either because the time complexity is log-linear for n insertions.

Upvotes: 9

Related Questions