Nasir
Nasir

Reputation: 531

How to insert data as fast as possible with Hibernate

I read file and create a Object from it and store to postgresql database. My file has 100,000 document that I read from one file and split it and finally store to database. I can't create List<> and store all document in List<> because my RAM is little. My code to read and write to database are as below. But My JVM Heap fills and can not continue to store more document. How to read file and store to database efficiently.

public void readFile() {
    StringBuilder wholeDocument = new StringBuilder();
    try {
        bufferedReader = new BufferedReader(new FileReader(files));
        String line;
        int count = 0;
        while ((line = bufferedReader.readLine()) != null) {
            if (line.contains("<page>")) {
                wholeDocument.append(line);
                while ((line = bufferedReader.readLine()) != null) {
                    wholeDocument = wholeDocument.append("\n" + line);
                    if (line.contains("</page>")) {
                        System.out.println(count++);
                        addBodyToDatabase(wholeDocument.toString());

                        wholeDocument.setLength(0);
                        break;
                    }
                }
            }
        }
        wikiParser.commit();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            bufferedReader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

public void addBodyToDatabase(String wholeContent) {
    Page page = new Page(new Timestamp(System.currentTimeMillis()),
            wholeContent);
    database.addPageToDatabase(page);
}

public static int counter = 1;

public void addPageToDatabase(Page page) {
    session.save(page);
    if (counter % 3000 == 0) {
        commit();
    }
    counter++;
}

Upvotes: 6

Views: 3120

Answers (5)

Vlad Mihalcea
Vlad Mihalcea

Reputation: 154130

First of all you should apply a fork-join approach here.

The main task parses the file and sends batches of at most 100 items to an ExecutorService. The ExecutorService should have a number of worker threads that equals the number of available database connections. If you have 4 CPU cores, let's say that the database can take 8 concurrent connections without doing to much context switching.

You should then configure a connection pooling DataSource and have a minSize equal to maxSize and equal to 8. Try HikariCP or ViburDBCP for connection pooling.

Then you need to configure JDBC batching. If you're using MySQL, the IDENTITY generator will disable bathing. If you're using a database that supports sequences, make sure you also use the enhanced identifier generators (they are the default option in Hibernate 5.x).

This way the entity insert process is parallelized and decoupled from the main parsing thread. The main thread should wait for the ExecutorService to finish processing all tasks prior to shutting down.

Upvotes: 11

Adrian Shum
Adrian Shum

Reputation: 40066

Actually it is hard to suggest to you without doing real profiling and find out what's making your code slow or inefficient.

However there are several things we can see from your code

  1. You are using StringBuilder inefficiently

    wholeDocument.append("\n" + line); should be wrote as wholeDocument.append("\n").append(line); instead

    Because what you original wrote will be translated by compiler to whileDocument.append(new StringBuilder("\n").append(line).toString()). You can see how much unnecessary StringBuilders you have created :)

  2. Consideration in using Hibernate

    I am not sure how you manage your session or how you implemented your commit(), I assume you have done it right, there are still more thing to consider:

    • Have you properly set up batch size in Hibernate? (hibernate.jdbc.batch_size) By default, the JDBC batch size is something around 5. You may want to make sure you set it in bigger size (so that internally Hibernate will send inserts in a bigger batch).

    • Given that you do not need the entities in 1st level cache for later use, you may want to do intermittent session flush() + clear() to

      1. Trigger batch inserts mentioned in previous point
      2. clear out first level cache
  3. Switch away from Hibernate for this feature.

    Hibernate is cool but it is not panacea for everything. Given that in this feature you are just saving records into DB based on text file content. Neither you do need any entity behavior, nor you need to make use of first level cache for later processing, there is not much reason to make use of Hibernate here given the extra processing and space overhead. Simply doing JDBC with manual batch handling is going to save you a lot of trouble .

Upvotes: 2

Nasir
Nasir

Reputation: 531

I use @RookieGuy answer. stackoverflow.com/questions/14581865/hibernate-commit-and-flush

I use

session.flush();
session.clear();

and finally after read all documents and store them into database

tx.commit();
session.close();

and change

wholeDocument = wholeDocument.append("\n" + line);

to

wholeDocument.append("\n" + line);

Upvotes: 1

v.ladynev
v.ladynev

Reputation: 19976

You should begin a transaction, do the save operation and commit a transaction. (Don't begin a transaction after save!). You can try to use StatelessSession to exclude memory consumption by a cache.

And use more less value, for an example 20, in this code

if (counter % 20 == 0)

You can try to pass StringBuilder as a method's argument as far as possible.

Upvotes: 0

misbah
misbah

Reputation: 189

I'm not very much sure about the structure of your data file.It will be easy to understand, if you could provide a sample of your file.

The root cause of the memory consumption is the way of reading/iterating the file. Once something get read, stays in memory. You should rather use either java.io.FileInputStream or org.apache.commons.io.FileUtils.

Here is a sample code to iterate with java.io.FileInputStream

try (
        FileInputStream inputStream = new FileInputStream("/tmp/sample.txt");
        Scanner sc = new Scanner(inputStream, "UTF-8")
) {
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        addBodyToDatabase(line);
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

Here is a sample code to iterate with org.apache.commons.io.FileUtils

File file = new File("/tmp/sample.txt");
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        addBodyToDatabase(line);
    }
} finally {
    LineIterator.closeQuietly(it);
}

Upvotes: 0

Related Questions