Tacitus86
Tacitus86

Reputation: 1394

Avoiding ArrayList Concurrency with multiple threads

I have an ArrayList object:

    List<Sample> dataList = new ArrayList<Sample>();

This has a list of Sample objects. Sample contains a long timestamp and a double value primitives.

I have a program that will act on these via multiple threads. I have a thread that will prune the data 1/hr. The prune takes roughly 2 minutes to complete (low end embedded system and a lot of data). It calls the following function to do this:

 public synchronized void prune(long timestamp)
    {
        Iterator<Sample> it = dataList.listIterator();

        while (it.next().getTimestamp() < timestamp)
        {
            it.remove();
        }       

    }
}

I also have dynamic data being updated to this array via another thread at 1/sec. Depending on the data being added it can call one of the two following functions:

  public synchronized void addPointData(ArrayList<Sample> a)
    {

            a.addAll(dataList);
            dataList = a;

    }

    public synchronized void addPointData(Sample a)
    {

            dataList.add(a);
            if (dataList.size() > 0 && pruneLock == 0 && dataList.get(0).getTimestamp() < (System.currentTimeMillis() - 90000000L) * 1000000)
            {
                dataList.remove(0);
                startTimestamp = dataList.get(0).getTimestamp();
            }       
    }

In so far running this, I haven't had any concurrency exceptions and I don't believe I've had any lost data. I'm concerned about lost data if the pruner makes the Add functions wait for it. Can anyone explain why I haven't had an exception? Should I be doing this a different way?

Upvotes: 0

Views: 80

Answers (2)

rghome
rghome

Reputation: 8819

The fact that you have put synchronize on everything that touches your ArrayList will ensure that you don't get concurrency issues.

On the other hand, your performance will stink. Everytime a prune happens, everything that needs the data list will stop.

You have two big performance issues. The first is that remove on ArrayList is very inefficient. Everytime you remove something from the middle it has to shuffle down everything above it to fill the gap. The only reason this is even tolerable is because it uses System.arrayCopy which is a low level super-optimised call and is quick. But if you do a lot of deletions you are shifting down your tail for every deletion.

One thing that is not clear is whether your Samples are sorted. If they are in order and you can identify the start and end of where you need to prune, you should use removeRange to remove the block in one go.

If your list is sorted and you are removing from the front you are better off using an ArrayDeque as this efficiently supports removing from the front and the back.

Assuming that is not the case and the timestamps are randomly distributed in the array, it might be quicker to fill the gaps as you go with something like:

j = 0;
for (int i = 0; i < dataList.size(); i++) {
    Sample s = dataList.get(i);
    if (s.getTimestamp() >= timestamp) {
        dataList.set(j++, s);
    }
}
removeRange(j, dataList.size());

I haven't tested it, but maybe you get the idea.

Or maybe there is some Java 8 cleverness to do the same thing more elegantly.

But this is still going to lock your whole data structure while the prune is happening, so you could consider pruning in smaller chunks. This will synchronize your data for smaller periods of time and lead to shorter delays.

Upvotes: 1

user1676075
user1676075

Reputation: 3086

Since no one else has answered... good comments, and good points about how to store your data more efficiently. But that aside, if the operations you included are the only operations that modify the list, you have the code correct to prevent concurrent modification exceptions or loss of data. All operations that modify are synchronized. You'll have blocking situations, but you won't ever have concurrent modifications.

Making the add functions wait for the prune to complete is exactly what should happen. As long as there's no other reason the add functions can't wait, it's fine.

As commenters pointed out, though, there are faster ways to go about this that may reduce the overall wait time. Given that you're always deleting by time, if you know things are added in time order, you can significantly optimize the process. If you're sorted by time and require that, there are definitely better options (or if you can choose to sort on insert). There are some uses of Java 8 streams that can parallelize and provide some different processing options.

But short answer is that you've got locking where needed to prevent issues.

Upvotes: 1

Related Questions