J.L
J.L

Reputation: 592

how to read from a huge file and write to a new file by java

What I am doing is to read one file line by line, format every line, then write to a new file. But the problem is that the file is huge, nearly 178 MB. But always getting error message: IO console updater error, java heap space. Here is my code:

public class fileFormat {
    public static void main(String[] args) throws IOException{

        String strLine;

        FileInputStream fstream = new FileInputStream("train_final.txt");
        BufferedReader reader = new BufferedReader(new InputStreamReader(fstream));
        BufferedWriter writer = new BufferedWriter(new FileWriter("newOUTPUT.txt"));

        while((strLine = reader.readLine()) != null){
            List<String> numberBox = new ArrayList<String>();
            StringTokenizer st = new StringTokenizer(strLine);
            while(st.hasMoreTokens()){
                numberBox.add(st.nextToken());
            }
            for (int i=1; i< numberBox.size(); i++){
                String head = numberBox.get(0);
                String tail = numberBox.get(i);
                String line = head + "  "+tail ;
                System.out.println(line);
                writer.write(line);
                writer.newLine();
            }
            numberBox.clear();
        }
        reader.close();
        writer.close();
    }
}

How can I avoid this error message? Moreover, I have set the VM preference: -xms1024m

Upvotes: 0

Views: 246

Answers (4)

Ashish Pancholi
Ashish Pancholi

Reputation: 4649

If you read whole file the heap memory will occupy so better option in to read the file in chuck. See my below code. It will start reading from the offset given in argument and will return the end offset . You need to pass number of lines to be read.

Please remember: You can use any collection to store these read lines and clear the collection before calling this method to read next chunk.

FileInputStream fis = new FileInputStream(file);
InputStreamReader   streamReader = new InputStreamReader(fis, "UTF-8");
LineNumberReader   reader = new LineNumberReader(streamReader);

//call this below method recursively until the file does not reaches to the end

public int getParsedLines(LineNumberReader reader, int iLineNumber_Start, int iNumberOfLinesToBeRead) {
    int iLineNumber_End = 0;

    int iReadUptoLines = iLineNumber_Start + iNumberOfLinesToBeRead;

    try {
        reader.mark(iLineNumber_Start);
        reader.setLineNumber(iLineNumber_Start);
        do {
            String str = reader.readLine();
            if (str == null) {
                break;
            }
            // your code


            iLineNumber_End = reader.getLineNumber();
        } while (iLineNumber_End != iReadUptoLines);
    } catch (Exception ex) {
        // exception handling
    }
    return iLineNumber_End;
}

Upvotes: 0

Claudiu
Claudiu

Reputation: 1489

This part of the code:

       for (int i=1; i< numberBox.size(); i++){
            String head = numberBox.get(0);
            String tail = numberBox.get(i);
            String line = head + "  "+tail ;
            System.out.println(line);
            writer.write(line);
            writer.newLine();
       }

Can be translated to:

       String head = numberBox.get(0);
       for (int i=1; i< numberBox.size(); i++){
            String tail = numberBox.get(i);
            System.out.print(head);
            System.out.print(" ");
            System.out.println(tail);
            writer.write(head);
            writer.write(" ");
            writer.write(tail);
            writer.newLine();
       }

This may add a little code duplication but it avoids creating a lot of objects.

Also there if you merge this for loop with the loop contructing the numberBox, you won't need numberBox structure at all.

Upvotes: 0

Thilo
Thilo

Reputation: 262494

The program looks okay. I suspect the problem is that you run this inside of Eclipse, and System.out is collected by Eclipse in memory (to be displayed in that Console window).

 System.out.println(line);

Try to run it outside of Eclipse, change Eclipse settings to pipe System.out somewhere, or remove the line.

Upvotes: 0

Daniel S.
Daniel S.

Reputation: 6640

Remove the line

System.out.println(line);

This is a workaround the fialing console updater, which otherwise runs out of memory.

Upvotes: 3

Related Questions