Daredevil
Daredevil

Reputation: 1802

Java- Multi-threading to improve performance on inserting data into db

I am building an index table (inverted file) on a table in MYSQL. The way it works is it extracts all the words from a file and store them to a hashset and then insert the words one by one to my database table.

It works perfectly and I am aware inverted file does take some time to build up the index table. I am trying to optimize the indexing time to the table, and I am considering using multi-threading. Will it speed up the performance?

However, I am not too sure with how to integrate it with my current program as I am new to multi-threading.

Code:

public static void main(String[] args) throws Exception {

        StopWatch stopwatch = new StopWatch();
        stopwatch.start();



        File folder = new File("D:\\PDF1");
        File[] listOfFiles = folder.listFiles();

        for (File file : listOfFiles) {
            if (file.isFile()) {
                HashSet<String> uniqueWords = new HashSet<>();
                String path = "D:\\PDF1\\" + file.getName();
                try (PDDocument document = PDDocument.load(new File(path))) {

                    if (!document.isEncrypted()) {

                        PDFTextStripper tStripper = new PDFTextStripper();
                        String pdfFileInText = tStripper.getText(document);
                        String lines[] = pdfFileInText.split("\\r?\\n");
                        for (String line : lines) {
                            String[] words = line.split(" ");

                            for (String word : words) {
                                uniqueWords.add(word)
                                ;

                            }

                        }
                        // System.out.println(uniqueWords);

                    }
                } catch (IOException e) {
                    System.err.println("Exception while trying to read pdf document - " + e);
                }
                Object[] words = uniqueWords.toArray();



                MysqlAccessIndex connection = new MysqlAccessIndex();

                for(int i = 1 ; i <= words.length - 1 ; i++ ) {

                    connection.readDataBase(path, words[i].toString());

                }

                System.out.println("Completed");

            }
        }

MySQL connection:

public class MysqlAccessIndex {
    public Connection connect = null;
    public Statement statement = null;
    public PreparedStatement preparedStatement = null;
    public ResultSet resultSet = null;

    public void connect() throws Exception {
        // This will load the MySQL driver, each DB has its own driver
        Class.forName("com.mysql.jdbc.Driver");
        // Setup the connection with the DB
        connect = DriverManager
                .getConnection("jdbc:mysql://126.32.3.20/fulltext_ltat?"
                        + "user=root&password=root");

        // Statements allow to issue SQL queries to the database
        statement = connect.createStatement();
        System.out.print("Connected");

    }
    public MysqlAccessIndex() throws Exception {

        connect();
    }


    public void readDataBase(String path,String word) throws Exception {
        try {

            // Result set get the result of the SQL query


            // This will load the MySQL driver, each DB has its own driver
            Class.forName("com.mysql.jdbc.Driver");
            // Setup the connection with the DB
            connect = DriverManager
                    .getConnection("jdbc:mysql://126.32.3.20/fulltext_ltat?"
                            + "user=root&password=root");

            // Statements allow to issue SQL queries to the database
            statement = connect.createStatement();
            System.out.print("Connected");
            // Result set get the result of the SQL query

            preparedStatement = connect
                    .prepareStatement("insert IGNORE into  fulltext_ltat.indextable values (default,?, ?) ");

            preparedStatement.setString(  1, path);
            preparedStatement.setString(2, word);
            preparedStatement.executeUpdate();
            // resultSet = statement
            //.executeQuery("select * from fulltext_ltat.index_detail");



            //  writeResultSet(resultSet);
        } catch (Exception e) {
            throw e;
        } finally {
            close();
        }

    }

I would appreciate any pointers.

Upvotes: 0

Views: 1601

Answers (1)

rzwitserloot
rzwitserloot

Reputation: 103893

No, pushing data into a database with multiple threads usually does not speed anything up.

Instead, try the following:

[1] when bulk-adding data, use the bulk-add data primitives your DB engine offers. I have no idea if mysql has support for this, and how to do that from java. In postgres, for example, you'd use COPY instead of INSERT.

[2] especially if you can't use COPY or similar, turn off all indices (delete them), then do all your inserts, then add the indices, this is faster than making indices first and then inserting.

[3] Use transactions, and commit the transaction every ~100 inserts or so. THis is faster than commit-after-every-insert and also faster than commit-after-hundreds-of-thousands, in most cases.

[4] start earlier. In your example code, you can start inserting right away instead of first stuffing all data into a hashset and then adding later.

[5] don't keep making prepared statements; reuse the same one.

[6] you make a statement, twice, and do nothing with it. Don't; you're wasting resources.

[7] preparedstatements need to be closed. You aren't closing them. That's probably going to slow things down considerably. Don't make so many (just one should do), and close them when you're done with it. Search for 'ARM', which is a java construct to make it easy to close resources properly. It's over 10 years old by now.

Upvotes: 3

Related Questions