Jack J
Jack J

Reputation: 1644

Memory issue when transfering large csv file to database

I am trying to read a 30GB CSV file with 60 columns and over 78 million rows into an H2 database on disk.

H2 url: jdbc:h2:file:./data/datasource

Simplified code I am using to transfer the file:

import java.io.BufferedReader;
import org.springframework.stereotype.Service;

@Service
public class RecordService{
    private RecordDao recordDao;
    
    public void readAndWriteToH2(){
        String path = "path/to/csv-file.csv";
        List<RecordEntity> records = new ArrayList<>();
        try (BufferedReader br = new BufferedReader(new FileReader(path))) {
            String line = br.readLine();  // skip header line
            for(int lineCount = 0; (line = br.readLine()) != null; lineCount++){
                String[] split = line.split(',');
                RecordEntity record = new RecordEntity();
                record.prop0 = split[0];
                record.prop1 = split[1];
                record.prop2 = split[2];
                records.add(record);
                
                if(lineCount > 1_000_000){
                    lineCount = 0;
                    recordDao.saveAll(records);  // Spring/Hibernate .saveAll()
                    records = new ArrayList<>();
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The problem I get is that I run out of heap memory after 16 million records are written. This means that Hibernate has already executed recordDao.saveAll(records) 15 times successfully before there is a memory problem. What I don't understand is why there is a memory problem. Are the lines read by the BufferReader not being discarded once used? Is Hibernate holding on to all of the records after recordDao.saveAll(records)?

java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringLatin1.newString(StringLatin1.java:752) ~[na:na]
at java.base/java.lang.String.substring(String.java:2839) ~[na:na]
at java.base/java.lang.String.split(String.java:3371) ~[na:na]
at java.base/java.lang.String.split(String.java:3354) ~[na:na]
at java.base/java.lang.String.split(String.java:3447) ~[na:na]
at gp.fake.service.FakeService.setupNationalAddressTableFromCsv(FakeService.java:42) ~[classes/:na]
at gp.fake.control.FakeControl.setupNationalAddressTable(FakeControl.java:20) ~[classes/:na]
at java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder) ~[na:na]
at java.base/java.lang.invoke.LambdaForm$MH/0x0000028f71007800.invoke(LambdaForm$MH) ~[na:na]
at java.base/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder) ~[na:na]
...

Upvotes: 1

Views: 43

Answers (0)

Related Questions