pandas to_sql truncates my data

Question

I was using df.to_sql(con=con_mysql, name='testdata', if_exists='replace', flavor='mysql') to export a data frame into mysql. However, I discovered that the columns with long string content (such as url) is truncated to 63 digits. I received the following warning from ipython notebook when I exported:

/usr/local/lib/python2.7/site-packages/pandas/io/sql.py:248: Warning: Data truncated for column 'url' at row 3 cur.executemany(insert_query, data)

There were other warnings in the same style for different rows.

Is there anything I can tweak to export the full data properly? I could set up the correct data schema in mysql and then export to that. But I am hoping a tweak can just make it work straight from python.

joris · Accepted Answer

If you are using pandas 0.13.1 or older, this limit of 63 digits is indeed hardcoded, because of this line in the code: https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py#L278

As a workaround, you could maybe monkeypatch that function get_sqltype:

from pandas.io import sql

def get_sqltype(pytype, flavor):
    sqltype = {'mysql': 'VARCHAR (63)',    # <-- change this value to something sufficient higher
               'sqlite': 'TEXT'}

    if issubclass(pytype, np.floating):
        sqltype['mysql'] = 'FLOAT'
        sqltype['sqlite'] = 'REAL'
    if issubclass(pytype, np.integer):
        sqltype['mysql'] = 'BIGINT'
        sqltype['sqlite'] = 'INTEGER'
    if issubclass(pytype, np.datetime64) or pytype is datetime:
        sqltype['mysql'] = 'DATETIME'
        sqltype['sqlite'] = 'TIMESTAMP'
    if pytype is datetime.date:
        sqltype['mysql'] = 'DATE'
        sqltype['sqlite'] = 'TIMESTAMP'
    if issubclass(pytype, np.bool_):
        sqltype['sqlite'] = 'INTEGER'

    return sqltype[flavor]

sql.get_sqltype = get_sqltype

And then just using your code should work:

df.to_sql(con=con_mysql, name='testdata', if_exists='replace', flavor='mysql')

Starting from pandas 0.14, the sql module is uses sqlalchemy under the hood, and strings are converted to the sqlalchemy TEXT type, wich is converted to the mysql TEXT type (and not VARCHAR), and this will also allow you to store larger strings than 63 digits:

engine = sqlalchemy.create_engine('mysql://scott:tiger@localhost/foo')
df.to_sql('testdata', engine, if_exists='replace')

Only if you still use the a DBAPI connection instead of a sqlalchemy engine, the issue remains, but this option is deprecated and it is recommended to provide an sqlalchemy engine to to_sql.

pandas to_sql truncates my data

Answers (2)

Related Questions