RoySinha
RoySinha

Reputation: 69

how to Insert million data into database

I am trying to inserts 1 million of records into the DB table.

I want to create at least 3 threads that each fires one insert,then we can get 3 parallel requests in every sec.We can control the firing of the inserts to happen exactly at the same time by making each thread wait and wake up together to the same interrupt.Then each thread goes to sleep until that 1 second window elapses.Then the whole process will repeats. how can i accomplish it?

Any suggestion or tips will be helpful.

Upvotes: 6

Views: 17317

Answers (5)

Mayank
Mayank

Reputation: 89

We can make use of the SQL query for inserting multiple rows at once:

INSERT INTO table_name (col1, col2, col3) VALUES (val1, val2, val3),
(val4, val5, val6), ... (valx, valy, valz)

I have tested this on SQL Server. I believe it should work on other databases as well (with minor modifications in the SQL query, if needed).

Note: I am using Spring Data JPA and Java 8+.

In this example, my Entity class is like this:

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name = "test")
public class Test {
    @Id
    private Long id;
    private String name;
    @Column(name = "age", columnDefinition = "int default 0", nullable = false)
    private int age;
    //Getters and Setters
}

And here is the class which is performing the relevant task:

import java.util.List;
import java.util.stream.Collectors;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;

import org.springframework.transaction.annotation.Transactional;

public class MyService {

    @PersistenceContext
    EntityManager entityManager;

    /*
     * This method takes a List of POJOs and inserts them as records into the database table.
     * Returns the number of records thus inserted.
     */
    @Transactional
    public int batchInsert(List<Test> list) {
    
        int totalRecordsInserted = 0;

        //How many records to insert in a single insert statement
        //(SQL Server allows max 1000)
        int count = 1000; 

        //Construct SQL query like this:
        //INSERT INTO test (id, age, name) 
        //VALUES (1, 1, 'Test 1'), (2, 2, 'Test 2'),
        //(3, 3, 'Test 3), ... (1000, 1000, 'Test 1000')
        String sqlQueryPart1 = "INSERT INTO test (id, age, name) VALUES\r\n";

        for (int i = 0; i < list.size(); i += count) {

            StringBuilder queryBuilder = new StringBuilder();

            //Take in chunks the next 1000 elements from the main list
            //and append to the SQL query the corresponding values
            list.stream().skip(i).limit(count).forEach(
                x -> queryBuilder.append("(" + x.getId() + "," + x.getAge() + "," + parse(x.getName()) + ")\r\n,"));

            //Remove the unwanted last comma sign from the batch insert query
            String sqlQuery = sqlQueryPart1 + queryBuilder.deleteCharAt(queryBuilder.length() - 1).toString();

            //Run the SQL query to insert these 1000 records
            int recordsInserted = entityManager.createNativeQuery(sqlQuery).executeUpdate();
                        
            totalRecordsInserted += recordsInserted;

        }

        return totalRecordsInserted;

    }

    //Added this method to enclose a String value in single quotes
    //And also to take care of null or single quotes in the String field
    private String parse(String x) {
    
        if(x == null)
            return "NULL";
    
        else if(x.contains("'"))
            return "'" + x.replaceAll("'", "''") + "'";
    
        else
            return "'" + x + "'";
    
    }
}

Upvotes: 0

slipperyseal
slipperyseal

Reputation: 2778

This is a quick example of a Batch Insert based on mykong code from.. http://www.mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/

This basically gives you the speed of sqlloader, which does batch inserts. And only 1 thread should be used.

What I have done here is put the inserts into a loop to show you have to clear the batch every few thousand records..

You would remove the infinite loop and have it insert data instead of the hard coded mkyong data

String insertTableSQL = "INSERT INTO DBUSER"
        + "(USER_ID, USERNAME, CREATED_BY, CREATED_DATE) VALUES"
        + "(?,?,?,?)";

PreparedStatement preparedStatement = dbConnection.prepareStatement(insertTableSQL);
try {
    dbConnection.setAutoCommit(false);

    int batchTotal=0;
    for  (;;) { // infinate loop? change this to get your data here
        preparedStatement.setInt(1, 101);
        preparedStatement.setString(2, "mkyong101");
        preparedStatement.setString(3, "system");
        preparedStatement.setTimestamp(4, getCurrentTimeStamp());
        preparedStatement.addBatch();
        if (batchTotal++ == 4096) {
            int[] result = preparedStatement.executeBatch();
            preparedStatement.clearBatch();
            batchTotal=0;                    
        }
    }
    if (batchTotal > 0) {
        int[] result = preparedStatement.executeBatch();
    }

    dbConnection.commit();
}  finally {
    preparedStatement.close();
}

Upvotes: 4

Nisha Gupta
Nisha Gupta

Reputation: 37

I have always used sqlloader or sqldeveloper to load huge data. It makes sense, java application to insert data makes sense when we have delta. But for first time setup/migration of data sql loader/similar option is best.

Upvotes: 1

jayesh
jayesh

Reputation: 2492

yes inserting large amount of data one record by record is not good practice it take long time and increases overhead of database even you split insert process into different thread then also overhead over database and other communication process over database expected to slowdown. beast way to import is..

1) take sql dump of your data if its from old database and import into new database which is super fast

2) if you like to import using program then you must have batch processing which allowed you to insert multiple record at once which reduce over head of transaction.

3) if you used some Database management UI tools they also allowed you import by CSV or excel that also fast which help you if you have data in files.

many other way but one of above three will fit for you..

Upvotes: 1

Gaurava Agarwal
Gaurava Agarwal

Reputation: 974

Inserting large amount of data is considered a bad practice. Insert large amount of data will take lot of time, while you can load data in to table directly using sqlloader or similar tool. These loaders are faster as they will not cause overhead of transaction management.

Here are some questions which I ask myself when I have huge data to be loaded in database.

  1. Is this reference/static data ( like country, cities, banks)
  2. Is this dynamic data generated by application/generated by use of application( like bills, call logs, payment, account activities).

Most of the cases it is case 1, in that case sqlloader is preferred.

Other cases(2) may arise due to data migration/upgrade, in that case also sqlloader is preferred.

Based on above explanation. You may choose to describe your problem better or choose sqlloader.

Upvotes: 4

Related Questions