Reputation: 21
I'm testing the to_sql
method of pandas DataFrame
.
to_sql
is using INSERT
SQL method and this is slower than COPY FROM
SQL method.
Can I select the COPY FROM
SQL method to be used by to_sql
, using SQLAlchemy and Postgresql (psycopg2) engine?
Upvotes: 0
Views: 2175
Reputation: 404
This isn't quite true. You can implement this to use COPY FROM
by providing your own function for the kwarg method
that to_sql
accepts. The key bit in the Pandas documentation for DataFrame.to_sql is where method
can be not only just a string, but a callable. That's what you'll provide
For example:
from csv import (writer as csv_writer, QUOTE_MINIMAL)
from io import StringIO
def some_function(df, connection):
try:
frame.to_sql(
'my_table',
connection,
schema='public',
if_exists='append',
index=False,
method=copy_from_method)
except Exception as err:
print('Got an error ({})'.format(str(err))
Then, you could implement copy_from_method
as something like this:
def copy_from_method(table, conn, keys, data_iter, pre_truncate=False, fatal_failure=False):
"Custom method for pandas.DataFrame.to_sql that will use COPY FROM"""
dbapi_conn = conn.connection
cur = dbapi_conn.cursor()
s_buf = StringIO()
writer = csv_writer(s_buf, quoting=QUOTE_MINIMAL)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
table_name = '{}.{}'.format(
table.schema, table.name) if table.schema else table.name
sql_query = 'COPY %s (%s) FROM STDIN WITH CSV' % (table_name, columns)
cur.copy_expert(sql=sql_query, file=s_buf)
return cur.rowcount
The example copy_from_method
could use some work obviously- and you'll need to be careful with quoting and escaping with the CSV writer in some cases. But this rough version should work relatively well, barring any typos/copy paste errors (this is a stripped down version I use in a project)
EDIT: For big datasets, don’t use format(), f-strings are significantly faster
Upvotes: 2
Reputation: 139162
You are correct to_sql
is using INSERT INTO
via sqlalchemy (code where this happens is here), and so naturally you cannot use COPY FROM
using to_sql
.
But if you need this, you can of course make an own implementation of to_sql
using that. And if you think this has much better performance and the same functionality as the current implementation, you can always send a contribution to pandas (https://github.com/pydata/pandas/blob/master/CONTRIBUTING.md).
But, as far as I understand, if you want to execute an SQL statement as "COPY table FROM file"
, you can always just use the execute
command of the postgres engine.
Upvotes: 0