Pandas to_sql with SQLAlchemy and PSQL never finishes

Question

For whatever reason, I'm able to easily read data from a postgres database using the pandas read_sql method, but even with exactly the same parameters df.to_sql doesn't work. The operation seems to start, but it never finishes. (I've tried waiting for about 10 minutes.) The df itself is only about 50k rows and 20 columns. Here's what I've been doing:

from sqlalchemy import create_engine
from getpass import getpass

p = getpass("Input password")
in_string ='postgresql://:{}@:5432/'.format(p)
engine = create_engine(in_string)

df.reset_index().to_sql('', engine, schema='', index=False, chunksize=500)

Another interesting tidbit is that a new table is created in the database, it's just not populated with any data whatsoever. The column names and datatypes all seem to be correct though.

I've tried changing chunksizes, reindexing, and just waiting. This is very frustrating also because there's no error. Has anyone else encountered this problem with pandas to_sql? Any ideas what could be causing this issue?

EDIT: Minutes after posting this, it finished. In total, I think it took about 20 minutes and the index was passed as a column despite the index=False argument. The question remains. I think this is slow because under the hood pandas inserts every single row individually? Are there any faster alternatives I could use from within python?

EDIT2: I should have said this previously. I'm trying to write a script to iterate through a list of csv files (about 20) and "copy" them into non-existent tables in a psql database. The nice thing about pandas is that it will infer the dtypes in an intelligent way and translate that into sql's arcane (at least to me) data types, create the tables, and then insert the data.

Pandas to_sql with SQLAlchemy and PSQL never finishes

Answers (1)

Related Questions