bridgemnc
bridgemnc

Reputation: 378

LinkedList vs ArrayList: get, add, remove performance comparison - unexpected results for add method

As far as I remember ArrayList should be faster on gets and slower on add and remove. Given that - could you please give me a hint what is wrong with this code if it produces completely different results: remove time is almost the same for ArrayList and LinkedList and addition is faster for ArrayList...

import java.util.ArrayList;
import java.util.LinkedList;

public class ListsTest {
    public static void main(String[] args) {
        ArrayList arrayList = new ArrayList();
        LinkedList linkedList = new LinkedList();

        // ArrayList add
        long startTime = System.nanoTime();

        for (int i = 0; i < 100000; i++) {
            arrayList.add(i);
        }
        long endTime = System.nanoTime();
        long duration = endTime - startTime;
        System.out.println(" ArrayList add: " + duration);

        // LinkedList add
        startTime = System.nanoTime();

        for (int i = 0; i < 100000; i++) {
            linkedList.add(i);
        }
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println("LinkedList add: " + duration);

        // ArrayList get
        startTime = System.nanoTime();

        for (int i = 0; i < 10000; i++) {
            arrayList.get(i);
        }
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println(" ArrayList get: " + duration);

        // LinkedList get
        startTime = System.nanoTime();

        for (int i = 0; i < 10000; i++) {
            linkedList.get(i);
        }
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println("LinkedList get: " + duration);


        // ArrayList remove
        startTime = System.nanoTime();

        for (int i = 9999; i >=0; i--) {
            arrayList.remove(i);
        }
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println(" ArrayList remove: " + duration);

        // LinkedList remove
        startTime = System.nanoTime();

        for (int i = 9999; i >=0; i--) {
            linkedList.remove(i);
        }
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println("LinkedList remove: " + duration);
    }
}

My results:

 ArrayList add: 13540332
LinkedList add: 93488785
 ArrayList get: 676937
LinkedList get: 335366109
 ArrayList remove: 529793354
LinkedList remove: 410813052

Edit: As it was mentioned in a few comments it is important whether we add/remove/get to/from the end of the list or whether we use random index. When using random index all results are "correct":

arrayList.get((int) (Math.random()*arrayList.size()));

The same issue is resolved here: https://coderanch.com/t/669483/certification/ArrayList-LinkedList-speed-comparison

Upvotes: 1

Views: 961

Answers (2)

Sebsen36
Sebsen36

Reputation: 494

Technical answer:

On the top level perspective you code looks pretty well and would naively be used by most of Java developers to write some (micro) benchmark tests.

In fact (micro) benchmarking in Java is a really tricky thing to do, since the JVM is a big black box and also the JIT is performing a lot of optimizations to your code - even while executing it! - The beauty and the beast of Java.

For example:

Java is - or was - considered a quite slow language due to it's use of an Interpreter/ JIT. These will only read and compile your instructions only just a moment before being actually executed. This in fact is only true for the first ~10.000 executions (default value). When Java detects a peace of code being executed for more then 10.000 times it performs a deeper analysis on it and "pre-compiles" it. Then on the next execution instead of interpreting it again, the pre-compiled code is used to execute it even faster.

Therefore it got it's name "Hotspot"-JVM - it'll identify the "hot spots" of your code and optimize it accordingly (so that analysis is being performed dynamically during the code execution and not only upfront with static code analysis). Btw: This threshold can be controled with the -XX:CompileThreshold= option of the JVM.

A second thing is, that Java is pretty good in reading your code and finding unused or uneffective instructions. When Java notices them it'll restructure your code to be more efficient or will even completely remove unnecessary instructions from the code base.

This are only two examples of how Java and the JVM optimize the code and there are a lot more! Too many to list all of them here, but there are a lot of great resources out there referring to it and explaining it in details (one short overview is linked here). Also note the used techniques and the success of the optimizations so also differs between the different JVM vendors (Oracle, IBM, SAP, Red Hat, ...).

Therefore to write the unbiased micro benchmarks in Java the JMH (Java Microbenchmark Harness) projects exists and should be used for that. It provides you many insights on of JVM optimizations work and how you can write your tests to measure the impact of them (you have the ability to control some of them with special instructions offered by the JMH) - or to benchmark your code unbiased of any of the optimizations.

This technical benchmarking also depends on the actual machine your code is executed on.

Everything counts:

  • OS (Operating system)
  • Java version
  • JVM being used (vendor)
  • JVM settings
  • memory allocation processes of the JVM
  • invocations of the garbage collector (GC)
  • the installed hardware and drivers (CPU, RAM, HDD/ SSD)
  • current load on the OS and background activities (Spotify/ Netflix running in the background?)
  • many more ...

The JMH also contains methods and strategies (like "JVM warmup runs") to reduce this unique load peaks in order to get an unbiased result - but there will always be an impact on the measurement.

Therefore the concrete benchmarks results cannot be compared between 2 different machines - but they'll give a rough idea how performant a peace of code can work (specially when comparing 2 different algorithms on a single machine - like it is in your case).


Theoretical answer:

As you already mentioned there is the Big-O notation to analyze and evaluate algorithms performance. This will have a look on the structure of an algorithm and evaluate it's performance based on it - neglecting the difficulties descibed in the technical answer above to come to a "qualitative" and not an "absolute" valuation. It'll have a look on the best, average and worst case scenario and then come to a rating based on it.

According to the Big-O model you've already identified the theoretical odd's and benefits of an ArrayList over an LinkedList. But like you also noticed the performance of the data structures (so the implemented algorithms) varies depending on the real data they work with - whether it's reflecting more the best, average or worst case scenario.

In your micro benchmark for example you've reflected more like the best case scenario, in which elements only get added to the end of the lists (as well as being removed from there instead of for example from the middle -> the ArrayList doesn't need to rearrange it's internal array to avoid gaps therein). If you'd change the code/ benchmark to add the elements to the head of the list (or somewhere in the middle), the performance of the ArrayList would also drop drastically already.

This best, average and worst case scenarios therefore can be considered to be implemented as separate benchmarks in the JMH like explained in the technical part of the answer, so that the performance can be compared for each of the scenarios.

But in case you're sure how your data is going to look like and you can specify it very accurately, the benchmarks will be most valuable if they also reflect your real data. In case you cannot specify it, you might implement all the scenarios to get a general overview.


EDIT:

This of course is only the executive point of view on the algorithms - and looks like you're only interested in it.

To complete the answer: I'd only want to mention that there is also the possibility to look on the memory consumption of the algorithms/ data structures. In case it's a requirement you need to consider it might also be worth to take a look on that. For example it also has been discussed here already and gives you a good overview when your interested in it.

Upvotes: 3

Ahmed Ibrahim
Ahmed Ibrahim

Reputation: 144

Arraylist is just an array [x1,x2,x3,x4,null,null,null,null] .. and it must have certain size

  • on add O(1) .. arraylist adding to the end of list is indeed O(1) operation, but because array must have specific size, so keep adding its going to cause from time to time a new array is built with bigger size and you elements will be copied to this new array as O(N/2) operation on average then adding to the end of the list is going to be O(1) again.

well its good practice, if you know how much your size your arraylist will be, you should initialize arraylist with the size so that adding to the end of list would be O(1) operation. much efficient and faster

  • on get O(1) .. as arrays can reach its element by index directly

  • on remove from index O(N) .. a new array is built and elements are copied to the new array with ignoring the element you want to remove

Linkedlist is a sequence of nodes, each node has a pointer to the next element and previous element (Double Linkedlist), a node is just a class which have value, pointer to next node, pointer to previous node

  • to add to specific index O(N) , you will start from the first node and point to the next node and so on, till you find your index, then it will create a node with your value and hook it to the desired index of linkedlist by setting its pointers

  • to get O(N), you are going to start from the first Node, and loop through nodes till you reach your index

  • to remove from specific index O(N), as you will loop through nodes till you reach your node, then you will remove this node completely and changing the pointers of its sibling nodes

as you can see, all your operations done on the code are O(N) except the get of arraylist is O(1), this is why all their time are near each other except get Arraylist is low time interval

Upvotes: 0

Related Questions