Robert
Robert

Reputation: 772

How to increase the performance of inserting data into the database?

I use PostgreSQL 9.5 (and a newest JDBC driver - 9.4.1209), JPA 2.1 (Hibernate), EJB 3.2, CDI, JSF 2.2 and Wildfly 10. I've to insert a lot of data into the database (about 1 mln - 170 mln entities). The number of entities depends on a file which user will add to the form on the page.

What is the problem?

The problem is the execution time of inserting data into the database which is very slow. The execution time is growing every calling of the flush() method. I've put the println(...) method to know how fast the execution of the flush method is. For the first ~4 times (400000 entities), I receive result of the println(...) method every ~20s. Later, the execution time of the flush method is incredibly slow and it's still growing.

Of course, if I delete the flush() and clear() methods, I receive result of the println(...) method every 1s BUT when I approach to the 3 mln entities, I also receive the exception:

java.lang.OutOfMemoryError: GC overhead limit exceeded

What have I done so far?

The question is: How can I increase the performance of inserting data into database?

Here's the structure of my table:

  column_name  |   udt_name  | length | is_nullable |  key
---------------+-------------+--------+-------------+--------
id             |    int8     |        |     NO      |   PK
id_user_table  |    int4     |        |     NO      |   FK
starttime      | timestamptz |        |     NO      |
time           |   float8    |        |     NO      |
sip            |   varchar   |  100   |     NO      |
dip            |   varchar   |  100   |     NO      |
sport          |    int4     |        |     YES     |
dport          |    int4     |        |     YES     |
proto          |   varchar   |   50   |     NO      |
totbytes       |    int8     |        |     YES     |
info           |    text     |        |     YES     |
label          |   varchar   |   10   |     NO      |

Here's part of the EJB bean (first version) where I insert the data into the database:

@Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {

    /**
     * This's the first method which is executed. 
     * The CDI bean (controller) calls this method.
     * @param list - data from the file.
     * @param idFK - foreign key.
     */
    public void send(List<String> list, int idFK) {

        if(handleCSV(list,idFK)){
            //...
        }
        else{
            //...
        }
    }

    /**
     * The method inserts data into the database.
     */
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    private boolean handleCSV(List<String> list, int idFK){

        try{

            long start=0;
            Pattern patternRow=Pattern.compile(",");

            for (String s : list) {

                if(start!=0){

                    String[] data=patternRow.split(s);                  

                    //Preparing data...

                    DataStoreAll dataStore=new DataStoreAll();
                    DataStoreAllId dataId=new DataStoreAllId(start++, idFK);                    
                    dataStore.setId(dataId);

                    //Setting the other object fields...                                                    

                    entityManager.persist(dataStore);               

                    if(start%100000==0){
                        System.out.println("Number of entities: "+start);
                        entityManager.flush();
                        entityManager.clear();                      
                    }
                }
                else start++;
            }                       

        } catch(Throwable t){

            CustomExceptionHandler exception=new CustomExceptionHandler(t);
            return exception.persist("DDI", "handleCSV");
        }

        return true;
    }

    @Inject
    private EntityManager entityManager;
}

Instead of using the Container-Managed Transactions, I've tried to use Bean-Managed Transactions either (second version):

@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
    /**
     * This's the first method which is executed. 
     * The CDI bean (controller) calls this method.
     * @param list - data from the file.
     * @param idFK - foreign key.
     */
    public void send(List<String> list, int idFK) {

        if(handleCSV(list,idFK)){
            //...
        }
        else{
            //...
        }
    }

    /**
     * The method inserts data into the linkedList collection.
     */
    private boolean handleCSV(List<String> list, int idFK){

        try{

            long start=0;
            Pattern patternRow=Pattern.compile(",");
            List<DataStoreAll> entitiesAll=new LinkedList<>();

            for (String s : list) {

                if(start!=0){

                    String[] data=patternRow.split(s);                  

                    //Preparing data...

                    DataStoreAll dataStore=new DataStoreAll();
                    DataStoreAllId dataId=new DataStoreAllId(start++, idFK);                    
                    dataStore.setId(dataId);

                    //Setting the other object fields...                                                    

                    entitiesAll.add(dataStore);

                    if(start%100000==0){

                        System.out.println("Number of entities: "+start);
                        saveDataStoreAll(entitiesAll);                      
                    }
                }
                else start++;
            }

        } catch(Throwable t){

            CustomExceptionHandler exception=new CustomExceptionHandler(t);
            return exception.persist("DDI", "handleCSV");
        }

        return true;
    }

    /**
     * The method commits the transaction.
     */
    private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {

        Iterator<DataStoreAll> iter=entities.iterator();

        ut.begin();     

        while(iter.hasNext()){

            entityManager.persist(iter.next());
            iter.remove();
            entityManager.flush();
            entityManager.clear();
        }

        ut.commit();
    }

    @Inject
    private EntityManager entityManager;

    @Inject
    private UserTransaction ut;
}

Here's my persistence.xml:

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
   xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="
        http://xmlns.jcp.org/xml/ns/persistence
        http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
   <persistence-unit name="primary">
      <jta-data-source>java:/PostgresDS</jta-data-source>
      <properties>
         <property name="hibernate.show_sql" value="false" />
         <property name="hibernate.jdbc.batch_size" value="50" />         
         <property name="hibernate.order_inserts" value="true" />
         <property name="hibernate.order_updates" value="true" />
         <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
         <property name="reWriteBatchedInserts" value="true"/>         
      </properties>
   </persistence-unit>
</persistence>

If I forgot to add something, tell me about it and I'll update my post.

Update

Here's the controller which calls DataDaoImpl#send(...):

@Named
@ViewScoped
public class DataController implements Serializable {

    @PostConstruct
    private void init(){

        //...
    }

    /**
     * Handle of the uploaded file.
     */
    public void handleFileUpload(FileUploadEvent event){

        uploadFile=event.getFile();

        try(InputStream input = uploadFile.getInputstream()){

            Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");

            if(!folder.toFile().exists()){
                if(!folder.toFile().mkdirs()){
                    folder=Paths.get(System.getProperty("jboss.server.data.dir"));
                }
            }

            String filename = FilenameUtils.getBaseName(uploadFile.getFileName()); 
            String extension = FilenameUtils.getExtension(uploadFile.getFileName());
            filePath = Files.createTempFile(folder, filename + "-", "." + extension);

            //Save the file on the server.
            Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);

            //Add reference to the unconfirmed uploaded files list.
            userFileManager.addUnconfirmedUploadedFile(filePath.toFile());

            FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));

        } catch (IOException e) {

            //...
        }
    }

    /**
     * Sending data from file to the database.
     */
    public void send(){

        //int idFK=...

        //The model includes the data from the file and other things which I transfer to the EJB bean.
        AddDataModel addDataModel=new AddDataModel();       
        //Setting the addDataModel fields...        

        try{

            if(uploadFile!=null){

                //Each row of the file == 1 entity.
                List<String> list=new ArrayList<String>();

                Stream<String> stream=Files.lines(filePath);
                list=stream.collect(Collectors.toList());

                addDataModel.setList(list);
            }

        } catch (IOException e) {

            //...
        }   

        //Sending data to the DataDaoImpl EJB bean.
        if(dataDao.send(addDataModel,idFK)){

            userFileManager.confirmUploadedFile(filePath.toFile());

            FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
        }       
    }

    private static final long serialVersionUID = -7202741739427929050L;

    @Inject
    private DataDao dataDao;

    private UserFileManager userFileManager;
    private UploadedFile uploadFile;
    private Path filePath;
}

Update 2

Here's the updated EJB bean where I insert the data into the database:

@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {

    /**
     * This's the first method which is executed. 
     * The CDI bean (controller) calls this method.
     * @param addDataModel - object which includes path to the uploaded file and other things which are needed.
     */ 
    public void send(AddDataModel addDataModel){

        if(handleCSV(addDataModel)){
            //...
        }
        else{
            //...
        }
    }

    /**
     * The method inserts data into the database.
     */
    private boolean handleCSV(AddDataModel addDataModel){

        PreparedStatement ps=null;
        Connection con=null;

        FileInputStream fileInputStream=null;
        Scanner scanner=null;       

        try{

            con=ds.getConnection();
            con.setAutoCommit(false);

            ps=con.prepareStatement("insert into data_store_all "
                    + "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
                    + "values(?,?,?,?,?,?,?,?,?,?,?)");

            long start=0;       

            fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
            scanner=new Scanner(fileInputStream, "UTF-8");

            Pattern patternRow=Pattern.compile(",");            
            Pattern patternPort=Pattern.compile("\\d+");

            while(scanner.hasNextLine()) {

                if(start!=0){

                    //Loading a row from the file into table.

                    String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));

                    //Preparing datetime.

                    SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");                                                          
                    GregorianCalendar calendar=new GregorianCalendar();
                    calendar.setTime(simpleDateFormat.parse(data[1]));
                    calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\\.").split(data[1])[1])/1000);

                    //Preparing an entity

                    ps.setLong(1, start++); //id PK
                    ps.setInt(2, addDataModel.getIdFk()); //id FK
                    ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
                    ps.setDouble(4, Double.parseDouble(data[2])); //time
                    ps.setString(5, data[3]); //sip
                    ps.setString(6, data[4]); //dip

                    if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
                    else ps.setNull(7, java.sql.Types.INTEGER);

                    if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
                    else ps.setNull(8, java.sql.Types.INTEGER);

                    ps.setString(9, data[7]); //proto

                    if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
                    else ps.setObject(10, null);

                    if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
                    else ps.setString(11, null);

                    ps.addBatch();

                    if(start%100000==0){
                        System.out.println("Number of entity: "+start);

                        ps.executeBatch();
                        ps.clearParameters();
                        ps.clearBatch();
                        con.commit();                       
                    }
                }
                else{
                    start++;
                    scanner.nextLine();
                }
            }

            if (scanner.ioException() != null) throw scanner.ioException();

        } catch(Throwable t){

            CustomExceptionHandler exception=new CustomExceptionHandler(t);
            return exception.persist("DDI", "handleCSV");
        } finally{

            if (fileInputStream!=null)
                try {
                    fileInputStream.close();
                } catch (Throwable t2) {
                    CustomExceptionHandler exception=new CustomExceptionHandler(t2);
                    return exception.persist("DDI", "handleCSV.Finally");
                }
            if (scanner != null) scanner.close();
        }

        return true;
    }

    @Inject
    private EntityManager entityManager;

    @Resource(mappedName="java:/PostgresDS") 
    private DataSource ds;
}

Upvotes: 2

Views: 2079

Answers (2)

Steve C
Steve C

Reputation: 19445

As you are using WildFly 10, you are in a Java EE 7 environment.

Therefore you should consider using JSR-352 Batch Processing for performing your file import.

Have a look at An Overview of Batch Processing in Java EE 7.0.

This should resolve all your memory consumption and transaction issues.

Upvotes: 1

Robert Moskal
Robert Moskal

Reputation: 22553

Your problem is not necessarily the database or even hibernate, but that you are loading way too much data into memory at once. That's why you get the out of memory message and why you see the jvm struggling on the way there.

You read the file from a stream, but then push it all into memory when you create the the list of strings. Then you map that list of strings into a linked list of some sort of entity!

Instead, use the the stream to process your files in small chunks and insert the chunks into your database. A scanner based approach would look something like this:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // Talk to your database here!
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

You'll probably find the hibernate/ejb stuff works well enough after you make this change. But I think you'll find plain jdbc to be significantly faster. They say you can expect a 3x to 4x speed bump, depending. That would to make a big difference with a lot of data.

If you are talking about truly huge amounts of data then you should look into the CopyManager, that lets you load streams directly into the database. You can use the streaming apis to transform the data as it goes by.

Upvotes: 2

Related Questions