Dirk Paul
Dirk Paul

Reputation: 137

Export pandas Dataframe or numpy array to MySQL

I'm using python to export a large matrixs (shape around 3000 * 3000) into MySQL.

Right now I'm using MySQLdb to insert those values but it's too troublesome and too inefficient. Here is my code:

# -*- coding:utf-8 -*-


import MySQLdb
import numpy as np
import pandas as pd
import time


def feature_to_sql_format(df):
    df = df.fillna(value='')
    columns = list(df.columns)
    index = list(df.index)
    index_sort = np.reshape([[int(i)] * len(columns) for i in index], (-1)).tolist()
    columns_sort = (columns * len(index))
    values_sort = df.values.reshape(-1).tolist()
    return str(zip(index_sort, columns_sort, values_sort))[1: -1].replace("'NULL'", 'NULL')


if __name__ == '__main__':
    t1 = time.clock()
    df = pd.read_csv('C:\\test.csv', header=0, index_col=0)
    output_string = feature_to_sql_format(df)
    sql_CreateTable = 'USE derivative_pool;DROP TABLE IF exists test1;' \
                      'CREATE TABLE test1(date INT NOT NULL, code VARCHAR(12) NOT NULL, value FLOAT NULL);'
    sql_Insert = 'INSERT INTO test (date,code,value) VALUES ' + output_string + ';'
    con = MySQLdb.connect(......)
    cur = con.cursor()
    cur.execute(sql_CreateTable)
    cur.close()
    cur = con.cursor()
    cur.execute(sql_Insert)
    cur.close()
    con.commit()
    con.close()
    t2 = time.clock()
    print t2 - t1

And it consumes around 274 seconds totally.

I was wondering if there is a simplier way to do this, I thought of export the matrix to csv and then use LOAD DATA INFILE to import, but it's also too complicated.

I noticed that in pandas documentation pandas dataframe has a function to_sql, and in version 0.14 you can set the 'flavor' to 'mysql', that is:

df.to_sql(con=con, name=name, flavor='mysql')

But now my pandas version is 0.19.2 and the flavor is reduced to only 'sqlite'...... And I still tried to use

df.to_sql(con=con, name=name, flavor='sqlite')

and it gives me an error.

Is there any convinient way to do this?

Upvotes: 0

Views: 964

Answers (1)

Sam
Sam

Reputation: 4090

Later pandas versions support SQLalchemy connectors instead of flavor = "mysql"

First, install dependencies:

pip install mysql-connector-python-rf==2.2.2
pip install MySQL-python==1.2.5
pip install SQLAlchemy==1.1.1

Then create the engine:

from sqlalchemy import create_engine
connection_string= "mysql+mysqlconnector://root:@localhost/MyDatabase"
engine = create_engine(connection_string)

Then you can use df.to_sql(...):

df.to_sql('MyTable', engine)

Here are some things you can do in MYSQL to speed up your data load:

SET FOREIGN_KEY_CHECKS = 0;
SET UNIQUE_CHECKS = 0;
SET SESSION tx_isolation='READ-UNCOMMITTED';
SET sql_log_bin = 0;
#LOAD DATA LOCAL INFILE....
SET UNIQUE_CHECKS = 1;
SET FOREIGN_KEY_CHECKS = 1;
SET SESSION tx_isolation='READ-REPEATABLE';

Upvotes: 1

Related Questions