Reputation:
First and foremost-- I have a file of strings. The smallest file is about 20 strings. The largest file is currently 12,000 strings of varying lengths (anywhere from one character to about 80). I suspect I may have up to a 60,000 string file in the future.
Initially I made a standard array of strings with a default size of 200 and doubled the size and copied the array to a new array if needed (while reading the file into the array). This method was pretty fast. However, the readability and extra coding for methods like search or contains was not appealing. I tried a List interface instead-- and read the file in using the typical list.add(line) until there were no more lines.
My question is: What is the default size of an ArrayList<> and does this method result in too many allocations/resizes? Is there any performance points I should know about these two methods and which would be better?
Upvotes: 0
Views: 4126
Reputation:
ArrayList defaults to size 10. The amortized cost is not very expensive, even if you start with size 1. You could turn the cost down to nearly 0 if you initialize it with a high capacity:
List myList = new ArrayList<String>(100000);
Also, you should realize that the List
interface doesn't intrinsically have any performance standards. Its implementations like LinkedList
and ArrayList
do.
Edit: I'm lazy and would never use a straight array. ArrayList
pretty much is the array with all of the functions like add()
and remove()
built in. The traditional list implementation, the ArrayList
, is the alternative that I would usually consider, but if you are going to be searching the thing I'd suggest sorting it once after you're done loading it, and using an ArrayList
to make use of that with binary search.
Upvotes: 3
Reputation: 3409
This sounds like premature optimization to me (unless you're coding for mobile or very underpowered hardware). Short answer: always use ArrayList unless you have a very clear reason not to.
You're no doubt going to get responses talking about the costs of resizing, initial allocation sizes, etc... but in reality, loading / manipulating 60k strings is absolute peanuts in terms of processing time on today's hardware. Many old-school java people still have hangovers from the days in which object allocation and general memory operations were super slow.
In general, you can almost always get at least a slight performance boost by rolling your own implementation that is more "aware" of your problem domain than Java.util is, but the effort is rarely worth it. I'd just start with an ArrayList sized to say 60k elements (which is also absolute peanuts in terms of memory usage).
I recently worked on a project which managed complex data structures of 1-2 GB worth of millions of strings, and the standard out-of-the-box ArrayList and HashMap were more than sufficient.
Upvotes: 0
Reputation: 14256
I'm assuming what you're trying to differentiate between is using a LinkedList and an ArrayList.
And judging from your question, it looks like you care about the functions add and search.
If you're doing a lot of one off adding, LinkedList is going to be faster since it always has an O(1) cost for adds, while an array has to double periodically. Though as @bdares pointed out, you could just specify a large initial capacity, though you could end up with a lot of wasted memory doing this.
As far as contains goes, an ArrayList will be faster due to cache locality. Though both employ a linear search, the ArrayList will loop faster.
Might I suggest that if you don't care about the order you retrieve things, to go with a HashMap if you're looking to do a lot of contains calls. This will be significantly faster.
Upvotes: 0
Reputation: 285440
Most collections have a constructor that allows you to set an initial capacity. I know that ArrayList also has a method that allows you to increase the capacity of the list to a set minimum number, ensureCapacity
, and that setting these appropriately can have a significant effect on the time cost of using the collection.
Upvotes: 2