euri10
euri10

Reputation: 2626

pandas read_sql drops dot in column names

is that a bug or I'm doing specifically something wrong ? I create a df, put it in a sql table, df and table have a column with a dot in it. now when I read the df from the sql table, column names aren't the same. I wrote this little piece of code so that people can test it.

import sqlalchemy
import pandas as pd
import numpy as np

engine = sqlalchemy.create_engine('sqlite:///test.sqlite')
dfin = pd.DataFrame(np.random.randn(10,2), columns=['column with a . dot', 'without'])
print(dfin)
dfin.to_sql('testtable', engine, if_exists='fail')


tables = engine.table_names()
for table in tables:
    sql = 'SELECT t.* FROM "' + table + '" t'
    dfout = pd.read_sql(sql, engine)
    print(dfout.columns)
    print dfout

Upvotes: 3

Views: 3460

Answers (1)

joris
joris

Reputation: 139172

Solution is to pass sqlite_raw_colnames=True to your engine

In [141]: engine = sqlalchemy.create_engine('sqlite:///', execution_options={'sqlite_raw_colnames':True})

In [142]: dfin.to_sql('testtable', engine, if_exists='fail')

In [143]: pd.read_sql("SELECT * FROM testtable", engine).head()
Out[143]:
   index  column with a . dot   without
0      0             0.213645  0.321328
1      1            -0.511033  0.496510
2      2            -1.114511 -0.030571
3      3            -1.370342  0.359123
4      4             0.101111 -1.010498

SQLAlchemy does this stripping of dots deliberately (in some cases SQLite may store col names as "tablename.colname"), see eg sqlalchemy+sqlite stripping column names with dots? and https://groups.google.com/forum/?hl=en&fromgroups#!topic/sqlalchemy/EqAuTFlMNZk


This seems a bug, but not necessarily in the pandas read_sql function, as this relies on the keys method of the SQLAlchemy ResultProxy object to determine the column names. And this seems to truncate the column names:

In [15]: result = engine.execute("SELECT * FROM testtable")

In [16]: result.keys()
Out[16]: [u'index', u' dot', u'without']

So the question is if this is a bug in SQLAlchemy, or that pandas should make a workaround (by eg using result.cursor.description which gives the correct names)

For now, you can also use the sqlite fallback mode, using a DBAPI connection instead of SQLAlchemy engine (as this relies on cursor.description, here the correct column names are used:

In [20]: con = sqlite3.connect(':memory:')

In [21]: dfin.to_sql('testtable', con, if_exists='fail')

In [22]: pd.read_sql("SELECT * FROM testtable", con).head()
Out[22]:
   index  column with a . dot   without
0      0             0.213645  0.321328
1      1            -0.511033  0.496510
2      2            -1.114511 -0.030571
3      3            -1.370342  0.359123
4      4             0.101111 -1.010498

Upvotes: 4

Related Questions