Reputation: 51
How to parse large file like 1.2GB
where total lines in file is 36259190
. How to parse each line to an object and save it in a list.
I get each time an OutOfMemmoryError
.
List<Point> points = new ArrayList<>();
public void m2() throws IOException {
try (BufferedReader reader = Files.newBufferedReader(Paths.get(DATAFILE))) {
reader.lines().map(s -> s.split(","))
.skip(0)
.forEach(p -> points.add(newPoint(p[0], p[1], p[2])));
}
}
class Point {
String X;
String Y;
String Z;
}
Upvotes: 2
Views: 831
Reputation: 298559
Care for your data types. I’m quite sure that your points do not consist of three text fragments. So define the fields of Point
according to the actual type, e.g. using int
or double
. These primitive data types consume significantly less memory than their String
representation.
class Point {
double x, y, z;
Point(double x, double y, double z) {
this.x = x;
this.y = y;
this.z = z;
}
Point(String x, String y, String z) {
this.x = Double.parseDouble(x);
this.y = Double.parseDouble(y);
this.z = Double.parseDouble(z);
}
}
Then collect your data file as
public List<Point> m2() throws IOException {
try(BufferedReader reader = Files.newBufferedReader(Paths.get(DATAFILE))) {
return reader.lines().map(s -> s.split(","))
.map(a -> new Point(a[0], a[1], a[2]))
.collect(Collectors.toList());
}
}
Then, as noted by others, care for the memory allocated for your JVM. Using the point class above, you can handle 36 Mio instances using a heap of ~1½ GiB without problems…
Upvotes: 2
Reputation: 340148
The Answer by Shiro is correct, allocate more memory to Java.
If you cannot afford the memory, then use a database. For example, Postgres or H2.
One of the purposes for a database is to persist data to storage while efficiently handling memory for queries and for loading data as needed.
As you read each line of the data file, store immediately in the database. Later query for needed records. Instantiate objects in memory only for the needed rows from that query’s result set.
Upvotes: 1
Reputation:
You need to use command line arguments -Xms (min memory) -Xmx (max memory).
Examples:
-Xmx4G (4GB)
-Xmx200M (200MB)
java -jar program.jar -Xmx8G
Upvotes: 1