j riot
j riot

Reputation: 544

Python pandas to_sql 'append'

I am trying to send monthly data to a MySQL database using Python's pandas to_sql command. My program runs one month of data at a time and I want to append the new data onto the existing database. However, Python gives me an error:

_mysql_exceptions.OperationalError: (1050, "Table 'cps_basic_tabulation' already exists")

Here is my code for connecting and exporting:

conn = MySQLdb.connect(host     = config.get('db', 'host'),
                       user     = config.get('db', 'user'),
                       passwd   = config.get('db', 'password'),
                       db       = 'cps_raw') 

combined.to_sql(name            = "cps_raw.cps_basic_tabulation",
               con              = conn,
               flavor           = 'mysql', 
               if_exists        = 'append')

I have also tried using:

from sqlalchemy import create_engine

Replacing conn = MySQLdb.connect... with:

engine = mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>

conn   = engine.connect().connection

Any ideas on why I cannot append to a database?

Thanks!

Upvotes: 8

Views: 18128

Answers (2)

circle-ish
circle-ish

Reputation: 91

Since I had the same error message and stumbled across this post I leave this here for others to find.

I found two ways to solve the duplicated table creation although I lack the insight as to why this solves it:

  1. Either pass the database name in the url when creating a connection
  2. or pass the database name as a schema in pd.to_sql.

Doing both does not hurt. Also, a few years later it is (again?) possible to pass the pure connection to pandas. My guess would be that in the previous answer by joris the first of my solution cases might have implicitly solved the problem.

```
#create connection to MySQL DB via sqlalchemy & pymysql
user = credentials['user']
password = credentials['password']
port = credentials['port']
host = credentials['hostname']
dialect = 'mysql'
driver = 'pymysql'
db_name = 'test_db'

# setup SQLAlchemy   
from sqlalchemy import create_engine 
cnx = f'{dialect}+{driver}://{user}:{password}@{host}:{port}/' 
engine = create_engine(cnx) 

# create database
with engine.begin() as con:
    con.execute(f"CREATE DATABASE {db_name}")

############################################################
# either pass the db_name  vvvv - HERE- vvvv after creating a database
cnx = f'{dialect}+{driver}://{user}:{password}@{host}:{port}/{db_name}'      
############################################################
engine = create_engine(cnx) 

table = 'test_table'
col = 'test_col'
with engine.begin() as con:
    # this would work here instead of creating a new engine with a new link
    # con.execute(f"USE {db_name}")
    con.execute(f"CREATE TABLE {table} ({col} CHAR(1));")

# insert into database
import pandas as pd
df = pd.DataFrame({col : ['a','b','c']})

with engine.begin() as con:
    # this has no effect here
    # con.execute(f"USE {db_name}")
    df.to_sql(
        name= table,
        if_exists='append',
# passing con = cnx here would equally work
        con=con, 
############################################################
# or pass it as a schema vvvv - HERE - vvvv
        #schema=db_name,
############################################################
        index=False
    )```

Tested with python version 3.8.13, sqlalchemy 1.4.32 and pandas 1.4.2. Same problem might have appeared here and here.

Upvotes: 1

joris
joris

Reputation: 139172

Starting from pandas 0.14, you have to provide directly the sqlalchemy engine, and not the connection object:

engine = create_engine("mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>")
combined.to_sql("cps_raw.cps_basic_tabulation", engine, if_exists='append')

Upvotes: 7

Related Questions