Reputation: 45
I have written this code:
try(BufferedReader file = new BufferedReader(new FileReader("C:\\Users\\User\\Desktop\\big50m.txt"));){
String line;
StringTokenizer st;
while ((line = file.readLine()) != null){
st = new StringTokenizer(line); // Separation of integers of the file line
while(st.hasMoreTokens())
numbers.add(Integer.parseInt(st.nextToken())); //Converting and adding to the list of numbers
}
}
catch(Exception e){
System.out.println("Can't read the file...");
}
the big50m file has 50.000.000 integers and i get this runtime error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at unsortedfilesapp.UnsortedFilesApp.main(UnsortedFilesApp.java:37)
C:\Users\User\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 5 seconds)
I think the problem is the string variable named line
. Can you tell me how
to fix it ? Because i want fast reading i use StringTokenizer.
Upvotes: 0
Views: 1827
Reputation: 9131
Since all numbers are within one line, the BufferedReader
approach does not work or scale well. The complete file will be read into memory. Therefore the streaming approach (e.g. from @whbogado) is indeed the way to go.
StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("bigfile.txt"));
tokenizer.parseNumbers(); // default behaviour
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int)Math.round(tokenizer.nval));
}
}
As you are writing, that you are getting a heap space error as well, I assume, that it is not a problem with the streaming anymore. Unfortunately you are storing all values within a List. I think that is the problem now. You say in a comment, that you do not know the actual count of numbers. Hence you should avoid to store those in a list and do here as well some kind of streaming.
For all who are interested, here is my little testcode (java 8) that does produce a testfile of the needed size USED_INT_VALUES
. I limited it for now to 5 000 000 integers. As you can see running it, the memory increases steadily while reading through the file. The only place that holds that much memory is the numbers List
.
Be aware that initializing an ArrayList
with an initial capacity does not allocate the memory the stored objects need, in your case your Integers
.
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestBigFiles {
public static void main(String args[]) throws IOException {
heapStatistics("program start");
final int USED_INT_VALUES = 5000000;
File tempFile = File.createTempFile("testdata_big_50m", ".txt");
System.out.println("using file " + tempFile.getAbsolutePath());
tempFile.deleteOnExit();
Random rand = new Random();
FileWriter writer = new FileWriter(tempFile);
rand.ints(USED_INT_VALUES).forEach(i -> {
try {
writer.write(i + " ");
} catch (IOException ex) {
Logger.getLogger(TestBigFiles.class.getName()).log(Level.SEVERE, null, ex);
}
});
writer.close();
heapStatistics("large file generated - size=" + tempFile.length() + "Bytes");
List<Integer> numbers = new ArrayList<>(USED_INT_VALUES);
heapStatistics("large array allocated (to avoid array copy)");
int c = 0;
try (FileReader fileReader = new FileReader(tempFile);) {
StreamTokenizer tokenizer = new StreamTokenizer(fileReader);
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int) tokenizer.nval);
c++;
}
if (c % 100000 == 0) {
heapStatistics("within loop count " + c);
}
}
}
heapStatistics("large file parsed nummer list size is " + numbers.size());
}
private static void heapStatistics(String message) {
int MEGABYTE = 1024 * 1024;
//clean up unused stuff
System.gc();
Runtime runtime = Runtime.getRuntime();
System.out.println("##### " + message + " #####");
System.out.println("Used Memory:" + (runtime.totalMemory() - runtime.freeMemory()) / MEGABYTE + "MB"
+ " Free Memory:" + runtime.freeMemory() / MEGABYTE + "MB"
+ " Total Memory:" + runtime.totalMemory() / MEGABYTE + "MB"
+ " Max Memory:" + runtime.maxMemory() / MEGABYTE + "MB");
}
}
Upvotes: 0
Reputation: 61
On Running the program with -Xmx2048m, the provided snippet worked (with some adjustments: declared numbers as List numbers = new ArrayList<>(50000000); )
Upvotes: 0
Reputation: 870
here is an version that minimize the memory usage. No byte to char conversion. No String operations. But in this version it does not handle negative numbers.
public static void main(final String[]a) {
final Set<Integer> number = new HashSet<>();
int v = 0;
boolean use = false;
int c;
// Input stream avoid char conversion
try(InputStream s = new FileInputStream("C:\\Users\\User\\Desktop\\big50m.txt")) {
// No allocation in the loop
do {
if((c = s.read()) == -1) break;
if(c>='0' && c<='9') { v = v * 10 + c-'0'; use = true; continue; }
if(use) number.add(v);
use = false;
v = 0;
} while(true);
if(use) number.add(v);
} catch(final Exception e){ System.out.println("Can't read the file..."); }
}
Upvotes: 0
Reputation: 947
The readLine() method reads the whole line at once thus eating up a lot of memory. This is highly inefficient and does not scale to an arbitrary big file.
You can use a StreamTokenizer
like this:
StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("bigfile.txt"));
tokenizer.parseNumbers(); // default behaviour
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int)Math.round(tokenizer.nval));
}
}
I have not tested this code but it gives you the general idea.
Upvotes: 0
Reputation: 24630
Create a BufferedReader
from the file and read()
char by char. Put digit char into a String
, then Integer.parseInt()
, skip any non-digit char and continue parsing on the the next digit, etc, etc.
Upvotes: 1