Phaneendra
Phaneendra

Reputation: 43

How to insert 780k records Spring Data Postgres

CSV file will be uploaded in Amazon S3 bucket.

Current Implementation:

  1. Reading CSV file using OpenCSV utility's readAll which returns List
  2. Map every column value to an entity object
  3. Create a List object (Current Size :: 15000)
  4. Invoke Spring Data CRUD Repository saveAll(List objects)
  5. Out of 780k records only 570k records were persisted in the database.
  6. We do not see any exception / error after these many records insertion.

Absolutely no clue whats happening there after

After googling a bit found that copy command in PostgreSQL is the fastest way to READ/ WRITE in to PostgreSQL DB.

This is also supported in multiple languages Java, C, Python.

Tried using CopyManager API of PostgreSQL which has got overloaded versions of copyIn method.

copyIn(String sql, InputStream is) - We cannot use this since our csv contains Header Columns in the first line & we want to skip first line.

copyIn(String sql, Reader rd) - This doesn't accept CSVReader object as it is not of type java.io.Reader

Technologies

Upvotes: 1

Views: 2275

Answers (3)

OrangeDog
OrangeDog

Reputation: 38749

We cannot use this since our csv contains Header Columns in the first line & we want to skip first line.

That's what COPY ... WITH HEADER is for.

Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.

Upvotes: 1

Phaneendra
Phaneendra

Reputation: 43

I was able to figure out code flaw. We've provided one of the db columns as Unique constraint which need not be, whereas in CSV file there're duplicate values of this column. For this reason record insertion was failing.

Thanks, Phaneendra

Upvotes: 2

user330315
user330315

Reputation:

We cannot use this since our csv contains Header Columns in the first line & we want to skip first line.

So then just skip the first line. copyIn() does not "rewind" the reader it gets. You also don't need to use a CSVReader. The copy statement will take care of parsing the CSV file. All you need to do is provide a reader that is positioned at the first line that should be processed.

Something along the lines:

CopyManager mgr = new CopyManager(...);

BufferedReader in = new BufferedReader(new FileReader(...));
in.readLine(); // skip the header line

mgr.copyIn("copy target_table from stdin with (...)", in);

Make sure you provide the appropriate options in the copy command so that the file can be parsed correctly.

Upvotes: 0

Related Questions