BCArg
BCArg

Reputation: 2250

Cannot populate MySQL database from pandas dataframe with if_exists='append'

I am trying to write a script to populate a mySQL database with multiple pandas dataframes. For the sake of simplicity, I will demonstrate here the code to populate the db with a single pandas df

I am connecting to the db as follows:

import mysql.connector import pandas as pd

# create the cursor and the connector
conn = mysql.connector.connect(
        host='localhost',
        user='root',
        password='my_password')

c = conn.cursor(buffered=True)

# Create the database
c.execute('CREATE DATABASE IF NOT EXISTS ss_json_interop')


# Connect now to the ss_json_interop database
conn = mysql.connector.connect(
            host='localhost',
            user='root',
            password='my_password', 
            database='ss_json_interop')

c = conn.cursor(buffered=True)



#### Create the table
c.execute("""CREATE TABLE IF NOT EXISTS sample_sheet_stats_json (
        ss_ID int NOT NULL AUTO_INCREMENT,
        panel text,
        run_ID text,
        sample_ID text,
        i7_index_ID text,
        i7_index_seq text,
        i5_index_ID text,
        i5_index_seq text,
        number_reads_lane1 varchar(255),
        number_reads_lane2 varchar(255),
        total_reads varchar(255),
        PRIMARY KEY (ss_ID)
        )""")


#### create the engine
# more here: https://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database
database_username = 'root'
database_password = 'my_password'
database_ip       = '127.0.0.1'
database_name     = 'ss_json_interop'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.
                                               format(database_username, database_password, 
                                                      database_ip, database_name))

# define the engine
engine = create_engine("mysql+mysqldb://root:my_password@localhost/sample_sheet_stats_json")

I am trying to populate my df into a table called sample_sheet_stats_json. If I do:

df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='replace')

the command works and the table in the db is correctly populated. However, if I replace the if_exists='replace' by if_exists='append':

df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append')

I get a long error message, like so: (the error message is not complete. it continues replicating the structure of my df

(mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'index' in 'field list' [SQL: 'INSERT INTO sample_sheet_stats_json 

Strange enough, I can do df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append') as long as I run df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='replace before') i.e. if the table is already populated.

The same problem was already reported here. However, If I do:

df.to_sql('sample_sheet_stats_json', engine, if_exists='append')

I get the following error message:

(_mysql_exceptions.OperationalError) (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)") (Background on this error at: http://sqlalche.me/e/e3q8)

which does not make much sense, as I could already connect to the database with other commands, as shown above.

Does anyone know how can I fix it?

Upvotes: 3

Views: 3423

Answers (1)

BCArg
BCArg

Reputation: 2250

I have figure out what happened. The error message is telling that there is no column index in the pandas dataframe, which is in fact true.

Therefore I have to simply pass the argument index=False with the command df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append'):

df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append', index=False)

And that solves the problem.

Upvotes: 8

Related Questions