Reputation: 10665
I want to maintain an ordered List<Integer>
of size <= 10^6. Every time a new element will be added I will call Collections.sort()
method to sort the new element in the list. As per my knowledge ArrayList
is better performing than LinkedList
. But since I will be calling sort()
method quite often, I have come to understanding that linkedList
will perform better when sorting the list and will be a better choice over ArrayList
, since there is no shifting of elements as in case of ArrayList
(uses array
as underlying data structure). Any suggestions which will be more efficient.
Upvotes: 14
Views: 4513
Reputation: 43068
Since java does not have built in multiset, which is the perfect data structure for your situation, I will suggest using the TreeMultiset found in the guava library.
Multisets allow for duplicate elements, and a tree multiset will also add the benefit of keeping your collection sorted.
Upvotes: 6
Reputation: 73528
Calling sort()
on a LinkedList
is devastating on the performance, due to the default implementation of List.sort()
converting the List
to an array for sorting. There are very few cases in which it makes sense to use a LinkedList
, even though it may seem like it should be effective.
If you wish to have the collection always sorted, you really should use an ordered collection like a TreeSet
or maybe even a PriorityQueue
. It will provide cleaner code (as well as faster sorting), since you don't have to worry about calling sort()
yourself all the time.
Upvotes: 2
Reputation: 328568
You could use Collections#binarySearch
on the sorted list to find the correct insertion point. ArrayList would probably perform better than a LinkedList, especially for larg-ish sizes, but that is easy to test.
I ran a micro benchmark of various methods: using a sort after each insertion or a binarySearch to insert in the right place, both with ArrayList (AL) and LinkedList (LL). I also added Commons TreeList and guava's TreeMultiset.
Conclusions
TreeMultiset
, but it is not a list strictly speaking - the next best option is to use an ArrayList
+ binarySearchCode of the best performer, for reference:
@Benchmark public ArrayList<Integer> binarySearchAL() {
ArrayList<Integer> list = new ArrayList<> ();
Random r = new Random();
for (int i = 0; i < n; i++) {
int num = r.nextInt();
int index = Collections.binarySearch(list, num);
if (index >= 0) list.add(index, num);
else list.add(-index - 1, num);
current = list.get(0); //O(1), to make sure the sort is not optimised away
}
return list;
}
Full code on bitbucket.
Full results
The "Benchmark" column contains the name of the method under test (baseLine just fills a list without sorting it, the other methods have explicit names: AL=ArrayList, LL=LinkedList,TL=Commons TreeList,treeMultiSet=guava), (n) is the size of the list, Score is the time taken in milliseconds.
Benchmark (n) Mode Samples Score Error Units
c.a.p.SO28164665.baseLine 100 avgt 10 0.002 ± 0.000 ms/op
c.a.p.SO28164665.baseLine 1000 avgt 10 0.017 ± 0.001 ms/op
c.a.p.SO28164665.baseLine 5000 avgt 10 0.086 ± 0.002 ms/op
c.a.p.SO28164665.baseLine 10000 avgt 10 0.175 ± 0.007 ms/op
c.a.p.SO28164665.binarySearchAL 100 avgt 10 0.014 ± 0.001 ms/op
c.a.p.SO28164665.binarySearchAL 1000 avgt 10 0.226 ± 0.006 ms/op
c.a.p.SO28164665.binarySearchAL 5000 avgt 10 2.413 ± 0.125 ms/op
c.a.p.SO28164665.binarySearchAL 10000 avgt 10 8.478 ± 0.523 ms/op
c.a.p.SO28164665.binarySearchLL 100 avgt 10 0.031 ± 0.000 ms/op
c.a.p.SO28164665.binarySearchLL 1000 avgt 10 3.876 ± 0.100 ms/op
c.a.p.SO28164665.binarySearchLL 5000 avgt 10 263.717 ± 6.852 ms/op
c.a.p.SO28164665.binarySearchLL 10000 avgt 10 843.436 ± 33.265 ms/op
c.a.p.SO28164665.sortAL 100 avgt 10 0.051 ± 0.002 ms/op
c.a.p.SO28164665.sortAL 1000 avgt 10 3.381 ± 0.189 ms/op
c.a.p.SO28164665.sortAL 5000 avgt 10 118.882 ± 22.030 ms/op
c.a.p.SO28164665.sortAL 10000 avgt 10 511.668 ± 171.453 ms/op
c.a.p.SO28164665.sortLL 100 avgt 10 0.082 ± 0.002 ms/op
c.a.p.SO28164665.sortLL 1000 avgt 10 13.045 ± 0.460 ms/op
c.a.p.SO28164665.sortLL 5000 avgt 10 642.593 ± 188.044 ms/op
c.a.p.SO28164665.sortLL 10000 avgt 10 1182.698 ± 159.468 ms/op
c.a.p.SO28164665.binarySearchTL 100 avgt 10 0.056 ± 0.002 ms/op
c.a.p.SO28164665.binarySearchTL 1000 avgt 10 1.083 ± 0.052 ms/op
c.a.p.SO28164665.binarySearchTL 5000 avgt 10 8.246 ± 0.329 ms/op
c.a.p.SO28164665.binarySearchTL 10000 avgt 10 735.192 ± 56.071 ms/op
c.a.p.SO28164665.treeMultiSet 100 avgt 10 0.021 ± 0.001 ms/op
c.a.p.SO28164665.treeMultiSet 1000 avgt 10 0.288 ± 0.008 ms/op
c.a.p.SO28164665.treeMultiSet 5000 avgt 10 1.809 ± 0.061 ms/op
c.a.p.SO28164665.treeMultiSet 10000 avgt 10 4.283 ± 0.214 ms/op
For 100k items:
c.a.p.SO28164665.binarySearchAL 100000 avgt 6 890.585 ± 68.730 ms/op
c.a.p.SO28164665.treeMultiSet 100000 avgt 6 105.273 ± 9.309 ms/op
Upvotes: 18
Reputation: 3307
Under Oracle Java / OpenJDK 7 or higher, the asymptotic performance of both would be similar. Collections.sort
loads the list into an array, sorts the array, and loads the array back into the list by iterating through it (using a ListIterator
), replacing its elements.
In both cases, this is an array sort on a mostly-sorted array (which is O(n)
in OpenJDK 7 and higher, since it uses timsort), plus two list iterations (which are O(n)
in both cases - although I'd expect LinkedList
to have a worse constant term). So overall, it's an O(n)
process, but likely to be slower for LinkedList
.
If you're bulk inserting elements, the bulk insert will be O(n^2)
overall, which is slower than inserting them all and sorting, or following Smac89
's suggestion of using a TreeMultiset
(both would be O(n log(n))
).
And just for fun, here's a truly awful way to abuse TreeSet
to allow it to store duplicate elements:
public class AwfulComparator<E extends Comparable<E>> implements Comparator<E> {
public int compare(E o1, E o2) {
int compared = o1.compareTo(o2);
return (compared == 0)?1:compared; // Never compare equal
}
}
new TreeSet<String>(new AwfulComparator<>());
Upvotes: 1
Reputation: 1271
You should consider to use data structures that are designed to maintain order if the sorting is your main performance consideration.
Using the normal Java base classes you could use either of these:
PriorityQueue (in case you want to retain duplicates)
TreeSet (filter duplicates)
In any case it will be easiest to just prototype all versions and run some benchmarks + profiling.
Upvotes: 1