thomas
thomas

Reputation: 1174

Efficient way to load excel rows to a database?

We have large excel files (up to 100k rows with up to 150 columns, around 30% of the columns containing long text objects. The file size varries from 10mb to 60mb).

I need to load these to our Oracle database in the most efficient way.

I thought of loading each Excel file with xlrd, and then creating an INSERT statement for every Excel row, and execute my SQL with cx_Oracle.

My questions are -

  1. Should I create an INSERT statement for every single row, and then execute it (meaning plenty of cursor.execute(sql) calls), or should I have one massive string with all my inserts (separated by a semicolon), which means I have only one cursor.execute(sql) call?

  2. Assuming the whole operation takes lots of time, is there a possibility I'll get a timeout from the database? I remember having something similar a while ago, I was running a very long query via cx_Oracle and after a few hours there was a timeout and the connection was lost. What can be done about this?

  3. Is the concept any good? Perhaps there's a better way to implement this?

Thanks!

Upvotes: 0

Views: 1264

Answers (1)

FrobberOfBits
FrobberOfBits

Reputation: 18002

The short answer is that you should use Cursor.prepare(statement[, tag]) and Cursor.executemany(statement, parameters).

Preparing the statement ahead of time saves you recompiling the same SQL INSERT over and over again. Do not do one insert per row, it will be slow.

Executmany does many inserts in one fell swoop.

Now, there are two basic ways you could go -- batching is one way. The problem with batching is that you have to design the batch approach, choose the number, and so on. Larger batches will be more efficient, but you have to consider the memory tradeoff (you probably can't fit everything at once).

Your other approach is to use a generator in python. That is, make a lazy-constructed list that's generated as needed. You can effectively create a list of infinite size by generating it as its consumed by executemany.

Also, please read the answer to this related question about high performance inserts with python/oracle.

Upvotes: 1

Related Questions