I find myself frequently doing the following: Iterator<A> itr = iterableOfA.getIterator(); List<B> list = new ArrayList<>(); // how about LinkedList? while (itr.hasNext()) { B obj = iter.next().getB(); list.add(obj); } someMethod(list); // this method takes an Iterable I have no idea just how many elements are likely to be in iterableOfA — could be 5, could be 5000. In this case, would LinkedList be a better implementation to use here (since list.add(obj) would then be O(1))? As it stands, if iterableOfA has 5000 elements, this will lead to many resizings of backing array of list . Other option is to do: Iterator<A> itr = iterableOfA.getIterator(); int size = Iterables.size(iterableOfA); // from Guava List<B> list = new ArrayList<>(size); // and the rest... This means double iteration of iterableOfA . Which option would be best when the size of the iterable is unknowns and can vary wildly: Just use ArrayList . Just use LinkedList . Count the elements in iterableOfA and allocate an ArrayList . Edit 1 To clarify some details: I am optimizing primarily for performance and secondarily for memory usage. list is a short-lived allocation as at the end of the request no code should be holding a reference to it. Edit 2 For my specific case, I realized that someMethod(list) doesn't handle an iterable with greater than 200 elements, so I decided to go with new ArrayList<>(200) which works well enough for me. However, in the general case I would have preferred to implement the solution outlined in the accepted answer (wrap in a custom iterable, obviating the need for allocating a list). All the other answers gave valuable insight into how suitable ArrayList is compared to LinkedList , so on behalf of the general SO community I thank you all!

Reputation: 4619

Which implementation to use when creating a List from Iterable

I find myself frequently doing the following:

Iterator<A> itr = iterableOfA.getIterator();
List<B> list = new ArrayList<>(); // how about LinkedList?
while (itr.hasNext()) {
    B obj = iter.next().getB();
    list.add(obj);
}
someMethod(list); // this method takes an Iterable

I have no idea just how many elements are likely to be in iterableOfA — could be 5, could be 5000. In this case, would LinkedList be a better implementation to use here (since list.add(obj) would then be O(1))? As it stands, if iterableOfA has 5000 elements, this will lead to many resizings of backing array of list.

Other option is to do:

Iterator<A> itr = iterableOfA.getIterator();
int size = Iterables.size(iterableOfA); // from Guava
List<B> list = new ArrayList<>(size);
// and the rest...

This means double iteration of iterableOfA. Which option would be best when the size of the iterable is unknowns and can vary wildly:

Just use ArrayList.
Just use LinkedList.
Count the elements in iterableOfA and allocate an ArrayList.

Edit 1

To clarify some details:

I am optimizing primarily for performance and secondarily for memory usage.
list is a short-lived allocation as at the end of the request no code should be holding a reference to it.

Edit 2

For my specific case, I realized that someMethod(list) doesn't handle an iterable with greater than 200 elements, so I decided to go with new ArrayList<>(200) which works well enough for me.

However, in the general case I would have preferred to implement the solution outlined in the accepted answer (wrap in a custom iterable, obviating the need for allocating a list).

All the other answers gave valuable insight into how suitable ArrayList is compared to LinkedList, so on behalf of the general SO community I thank you all!

Upvotes: 2

Answers (4)

maaartinus

Reputation: 46472

LinkedList is a cache-hostile memory eating moster, which its father (Joshua Bloch) regrets.

I'd bet, it's not faster in your case as ArrayList resizing is optimized and takes amortized O(1) per element, too.

Basically, the only case when LinkedList is faster, is the following loop:

for (Iterator<E> it = list.iterator(); it.hasNext(); ) {
    E e = it.next();
    if (someCondition(e)) e.remove();
}

As it stands, if iterableOfA has 5000 elements, this will lead to many resizings of backing array of list.

Many is something like log(5000 / 10) / log(1.5), i.e., 15. But the count doesn't matter much as the last resizings dominate. You'll be copying each object reference maybe twice, that's cheap.

Assuming you'll be doing anything with the list, it's very cheap .

Iterating just to find out the number of elements might help in some cases, but the speed depends on the input Iterable. So unless you need the speed really badly and you know that the input is never really slow, I'd refrain from such an optimization.

Upvotes: 1

jtahlborn

Reputation: 53694

I would completely skip copying the elements to a new collection.

We have utility code for easily wrapping Iterators into Iterables and Filter for converting between types, but the gist of it is:

final Iterable<A> iofA ... ;
Iterable<B> iofB = new Iterable<B>() {
  public Iterator<B> iterator() {
    return new Iterator<B>() {
      private final Iterator<A> _iter = iofA.iterator();
      public boolean hasNext() { return _iter.hasNext(); }
      public B next() { return _iter.next().getB(); }
    };
  }
};

No additional storage, etc. necessary.

Upvotes: 1

Stephen C

Reputation: 719199

Which option would be best when the size of the iterable is unknowns and can vary wildly

It depends what you are optimizing for.

If you are optimizing for performance, then using ArrayList is probably faster. Even though ArrayList will need to resize the backing array, it does this using an exponential growth pattern. However, it depends on the overheads of iteration.
If you are optimizing for long-term memory usage, consider using ArrayList followed by trimToSize().
If you are optimizing for peak memory usage, the "count first" approach is probably the best. (This assumes that you can iterate twice. If the iterator is actually a wrapper for a lazy calculation .... this may be impossible.)
If you are optimizing to reduce GC, then "count first" is probably best, depending on the details of the iteration.

In all cases you would be advised to:

Profile you application before you spend more time on this issue. In a lot of cases you will find that this is simply not worth your effort in optimizing.
Benchmark the two alternatives that you are considering, using the classes and typical data structures from your application.

As it stands, if iterableOfA has 5000 elements, this will lead to many resizings of backing array of list.

The ArrayList class resizes to a new size that proportional to the current size. That means that the number of resizings is O(logN), and the overall cost of N list append calls is O(N).

Upvotes: 2

Rajesh Jose

Reputation: 342

3rd option is not bad. For getting size, most of the collections just return the counter that they maintain internally...it does not iterate through the entire list. It depends on the implementation, but all the java.util.xxx collection classes does that way.

If u know what are the potential types of "iterableOfA", u can check how they are doing size.

If "iterableOfA" is going to be some custom implementation and you are not sure how size is done, linkedlist would be safer. That is because ur size varies and potential of resizing is higher, hence u will not get a predictable performance.

Also not sure what operations you are performing in the collection that you are filling "B", your choice would depend on that also.

Upvotes: 1

Which implementation to use when creating a List from Iterable

Edit 1

Edit 2

Answers (4)

Related Questions