Lostsoul
Lostsoul

Reputation: 26067

Memory error while writing large queued data to file

I'm not sure how to specifically deal with this(new to Java). Basically I have a program that generates a lot of data thats beyond my memory(for example, its 10 gigs data and I have 4 gigs of ram). I decided to fork a thread that takes the data and writes it to disk, although I know disk writes could never keep up with the process generating it, I was hoping my application can be bound to how quickly I can write to disk. But after a while I get heap outofmemory errors.

Here's parts I think are relevant: All data is to be written is put in this variable:

private static Queue<short[]> result =  new LinkedList <short[]> ();

Here's the part that saves to file:

   static class SaveToFile extends Thread {


        public void run() {
                FileWriter bw = null;
                try {
                    bw = new FileWriter("output.csv");
                    Thread.sleep(500); //delay the start so the queue can have some data
                } catch (IOException e1) {
                    // TODO Auto-generated catch block
                    e1.printStackTrace();
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }

            System.out.println("size of results during execution is " + result.size());
            while(!result.isEmpty()) {
                short[] current = result.poll();
                try {
                    bw.write(Arrays.toString(current) + "," + "\n");
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
            try {
                bw.flush();
                bw.close();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            System.out.println("file writing is done");
        }
    }

I'm not sure what I'm doing wrong, do I need to block the result's queue at a certain size so my process stops writing to it? or am I doing something wrong with the writing to file, I am showing the non-buffered version but I have tried bufferedWriter with the same result? I have observed that while the program is running the file size is 0, only once it crashes it seems to write..is it holding this in memory even without bufferedWriter and could that be causing the memory issue?

My idea was that as the SaveToFile thread clears the queue it makes more room for the other process to continue to write to it(these are the only threads I'm running, the main program and SaveToFile).

Upvotes: 1

Views: 478

Answers (2)

NPE
NPE

Reputation: 500893

do I need to block the result's queue at a certain size so my process stops writing to it?

Yes, you do. The producer generating data faster than it can be written out is the most likely cause of your process running out of memory.

Another issue is that LinkedList is not synchronized, so you need to use locking when using a LinkedList to pass data between threads.

To limit the capacity, you can use ArrayBlockingQueue or LinkedBlockingQueue. As an added bonus, both are thread-safe and thus won't require external synchronization.

Finally, if your code is I/O-bound, as it appear to be, you will probably get relatively little benefit from splitting it into two threads. This is worth bearing in mind, since it could be that you're introducing all this extra complexity for little or no benefit.

Upvotes: 1

rlinden
rlinden

Reputation: 2041

As you have already stated, your diskwriter is slower than your memorywriter. Hence, I believe that you will never get to the flush part, for result will never be empty.

I believe that the best way would be for you to create a class that contains a queue within it and establish a maximum queue size. So, if the memorywriter tried to enqueue something it would be blocked.

I suggest that your queue method does not do busy waiting, but yet is put to sleep waiting for a signal that would come from your dequeue method.

Upvotes: 1

Related Questions