Reputation: 1378
This may be duplicate question as many pages we can find. But still I need few expert opinion.
I need to read a large file containing customer record name,age,income.
I need to process this data in my application quickly and create a dashboard like similar age employees, income groups(between range) ...
Now the challenge is to read the large file (Explicitly I set my program heap to 512 MB) and used InputStream, Scanner classes (based on my understanding these classed will not load whole file in memory, pls correct me if wrong). so I am able to read the file with 7590912 records(250 MB file). but while setting those records in my arrayList <Employee>
is showing continuous spikes and Garbage collector activity (which is expected). now to reduce my Employee object I created only three fields [name(char[]), age[int], income[float]).
finally my program is very slow and not acceptable. Any suggestion to improve the performance except increasing the memory(Keeping in mind I will perform more operation over the collection).
EDIT- Using H2 database to flush the read data from file. created batch of 10000 records (still memory is 512 MB) ..but program is pathetically slow. but manage to alive for some time(till 300K records).. (utilized space 470 MB).
Pedantic
Upvotes: 0
Views: 779
Reputation: 372
I suggest you to use Hyper-Sql database.Hyper-Sql is written in Java. It offers a small, fast multithreaded and transactional database engine with in-memory and disk-based tables and supports embedded and server modes. In embedded mode it doesn't require a server running, and thus can be easily bundled with any java application because it only consists of three files. Using JDBC driver you can easily connect to database, and can enjoy the SQL powerful language.
Upvotes: 1
Reputation: 1588
As you read a record or some number of records you need to write them somewhere like a database so they don't stay in memory. Even though the Scanner or whatever else you are using doesn't force the values to stay in memory, if you are storing them in a List then they will be because they will be in your List. The Spring Batch framework is perfect for solving this problem.
If you aren't willing to incorporate a framework then you will need to do a lot of plumbing work yourself. I recommend reading in say 1000 records then writing them out. Clear your List then read the next 1000. Make the number of records to read in at a time be a variable so you can play with different values. Spring Batch calls this a chunk.
Upvotes: 2