Reputation: 896
I was using df.to_sql(con=con_mysql, name='testdata', if_exists='replace', flavor='mysql')
to export a data frame into mysql. However, I discovered that the columns with long string content (such as url) is truncated to 63 digits. I received the following warning from ipython notebook when I exported:
/usr/local/lib/python2.7/site-packages/pandas/io/sql.py:248: Warning: Data truncated for column 'url' at row 3 cur.executemany(insert_query, data)
There were other warnings in the same style for different rows.
Is there anything I can tweak to export the full data properly? I could set up the correct data schema in mysql and then export to that. But I am hoping a tweak can just make it work straight from python.
Upvotes: 3
Views: 7747
Reputation: 4339
Inspired by @joris's answer, I decided to hard code the change into the panda's source and re-compile.
cd /usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io
sudo pico sql.py
changed line 871
'mysql': 'VARCHAR (63)',
to
'mysql': 'VARCHAR (255)',
then recompiled the just that file
sudo python -m py_compile sql.py
restarted my script and _to_sql()
function wrote a table.
(I expected that the recompile would have broken pandas, but seems to have not.)
here is my script to write a dataframe to mysql, for reference.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlalchemy
from sqlalchemy import create_engine
df = pd.read_csv('10k.csv')
## ... dataframe munging
df = df.where(pd.notnull(df), None) # workaround for NaN bug
engine = create_engine('mysql://user:password@localhost:3306/dbname')
con = engine.connect().connection
df.to_sql("issues", con, 'mysql', if_exists='replace', index=True, index_label=None)
Upvotes: 5
Reputation: 139172
If you are using pandas 0.13.1 or older, this limit of 63 digits is indeed hardcoded, because of this line in the code: https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py#L278
As a workaround, you could maybe monkeypatch that function get_sqltype
:
from pandas.io import sql
def get_sqltype(pytype, flavor):
sqltype = {'mysql': 'VARCHAR (63)', # <-- change this value to something sufficient higher
'sqlite': 'TEXT'}
if issubclass(pytype, np.floating):
sqltype['mysql'] = 'FLOAT'
sqltype['sqlite'] = 'REAL'
if issubclass(pytype, np.integer):
sqltype['mysql'] = 'BIGINT'
sqltype['sqlite'] = 'INTEGER'
if issubclass(pytype, np.datetime64) or pytype is datetime:
sqltype['mysql'] = 'DATETIME'
sqltype['sqlite'] = 'TIMESTAMP'
if pytype is datetime.date:
sqltype['mysql'] = 'DATE'
sqltype['sqlite'] = 'TIMESTAMP'
if issubclass(pytype, np.bool_):
sqltype['sqlite'] = 'INTEGER'
return sqltype[flavor]
sql.get_sqltype = get_sqltype
And then just using your code should work:
df.to_sql(con=con_mysql, name='testdata', if_exists='replace', flavor='mysql')
Starting from pandas 0.14, the sql module is uses sqlalchemy under the hood, and strings are converted to the sqlalchemy TEXT
type, wich is converted to the mysql TEXT
type (and not VARCHAR
), and this will also allow you to store larger strings than 63 digits:
engine = sqlalchemy.create_engine('mysql://scott:tiger@localhost/foo')
df.to_sql('testdata', engine, if_exists='replace')
Only if you still use the a DBAPI connection instead of a sqlalchemy engine, the issue remains, but this option is deprecated and it is recommended to provide an sqlalchemy engine to to_sql
.
Upvotes: 9