Sagar Nepali
Sagar Nepali

Reputation: 179

Java: Best way to generate SQLs after parsing a xml file Using JAXB and insert to database without duplicates?

I've been assigned a task to unmarshal a XML file using JAXB and generate corresponding SQLs and fire to database. I've used following method to generate the list of SQLS.

public List<String> getSqlOfNationalityList(File file)throws JAXBException, FileNotFoundException, UnsupportedEncodingException {
List<String> unNationalityList = new ArrayList<String>();
JAXBContext jaxbcontext = JAXBContext.newInstance(ObjectFactory.class);
Unmarshaller unmarshaller = jaxbcontext.createUnmarshaller();

CONSOLIDATEDLIST consolidate = (CONSOLIDATEDLIST) unmarshaller.unmarshal(file);
    // accessing individuals properties
    INDIVIDUALS individuals = consolidate.getINDIVIDUALS();
    List<INDIVIDUAL> list = individuals.getINDIVIDUAL();

    for (INDIVIDUAL individual : list) {
        NATIONALITY nationality = individual.getNATIONALITY();
        if (nationality != null) {
            List<String> values = nationality.getVALUE();
            if (values != null) {
                for (String value : values) {
                    String string2 = "";
                    StringBuffer builder = new StringBuffer();
                    builder.append("INSERT INTO LIST_UN_NATIONALITY");
                    builder.append("(" + "\"DATAID\"" + "," + "\"VALUE\"" + ")");
                    builder.append(" " + "VALUES(");
                    string2 = string2.concat("'" + individual.getDATAID() + "'" + ",");
                    if ("null ".contentEquals(value + " ")) {
                        string2 = string2.concat("' '" + ",");
                    } else {
                        string2 = string2.concat("'" + value.replace("'", "/") + "'" + ",");
                    }

                    if (string2.length() > 0) {
                        builder.append(string2.substring(0, string2.length() - 1));
                    }
                    builder.append(");");
                    builder.append("\r\n");
                    unNationalityList.add(builder.toString());
                }
            }
        }
    }
    return unNationalityList;

}// end of file nationality List

I have used following method to read from the list and insert into database.

private void readListAndInsertToDb(List<String> list) {

        int duplicateCount = 0;
        int totalCount = 0;

        try {
                for (String sql : list) {

                    try {

                        int i = jdbcTemplate.update(sql);

                    } catch (DuplicateKeyException dke) {
                        // dke.printStackTrace();
                        duplicateCount++;

                    } catch (DataAccessException e) {
                        e.printStackTrace();
                    }
                    totalCount++;

                } // end of for


        } catch (SQLException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } 

        System.out.println("\r\nTotal : " + totalCount);
        System.out.println("Total duplicate : " + duplicateCount);


    }

Now the issue is, I've about 13-14 similar type of lists. And the xml file consists of records which may already exist in database.

  1. How can I fire queries without making duplicate entries in the PostGres database.
  2. How it could be done in best optimal way? It would be great if executed in batch

Upvotes: 1

Views: 597

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324801

Nono, don't generate a list of SQL statements.Especially don't interpolate them as strings!. Awoogah, awoogah, SQL injection alert.

Don't use a try/catch approach for duplicate handling either.

Improvements are, from simple and easy to harder but best:

  • At bare minimum use a PreparedStatement with bind parameters. Prepare it once. Then execute it for each input, with the parameters from the current data row.

    You cannot rely on drivers throwing DuplicateKeyException and you should also catch SQLException and check the SQLSTATE. Unless of course you plan on using one specific DBMS and your code checks that you're using the expected driver + version.

  • Better, use PostgreSQL's INSERT ... ON CONFLICT DO NOTHING feature to handle conflicts without needing exception handling. This lets you batch your inserts, doing many per transaction for better performance.

  • Further improve performance by using a multi-row VALUES list for INSERT ... ON CONFLICT DO NOTHING.

  • Even better, COPY all the data, including duplicates, into a TEMPORARY table using PgJDBC's CopyManager interface (see PGconnection.getCopyAPI()), create an index on the key used for duplicate detection, then LOCK the destination table and do a bulk

     INSERT INTO real_table
     SELECT ...
     FROM temp_table
     WHERE NOT EXISTS (SELECT 1 FROM real_table WHERE temp_table.key = real_table.key)
    

    or similar. This will be way faster. You can use INSERT ... ON DUPLICATE NO ACTION instead if you're on a new enough PostgreSQL.

Upvotes: 2

Related Questions