flyingmeatball
flyingmeatball

Reputation: 7997

pandas to_sql all columns as nvarchar

I have a pandas dataframe that is dynamically created with columns names that vary. I'm trying to push them to sql, but don't want them to go to mssqlserver as the default datatype "text" (can anyone explain why this is the default? Wouldn't it make sense to use a more common datatype?)

Does anyone know how I can specify a datatype for all columns?

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = #Data type for all columns#)

the dtype argument takes a dict, and since I don't know what the columns will be it is hard to set them all to be 'sqlalchemy.types.NVARCHAR'

This is what I would like to do:

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = 'sqlalchemy.types.NVARCHAR')

Any help/understanding of how best to specify all column types would be much appreciated!

Upvotes: 37

Views: 75625

Answers (4)

Georgii Savelev
Georgii Savelev

Reputation: 101

You can pass a sclalar dtype like sqlalchemy.NVARCHAR(None) to df.to_sql.

It will cause the Pylance reportArgumentType, but it will work nevertheless (tested with mssql), so it is to be considered a type hint problem on the pandas site

column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = sqlalchemy.NVARCHAR(None))

Upvotes: 0

Eric Kramer
Eric Kramer

Reputation: 320

Python has a pretty versatile collections library. The defaultdict class allows us to dynamically specify--via a lambda expression--what value should be returned when keys are missing.

Putting this to use for your example:

from sqlalchemy.types import NVARCHAR
from collections import defaultdict
always_nvarchar = defaultdict(lambda: NVARCHAR(length=255))
# ...
column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = always_nvarchar)

Upvotes: 2

Parfait
Parfait

Reputation: 107577

To use dtype, pass a dictionary keyed to each data frame column with corresponding sqlalchemy types. Change keys to actual data frame column names:

import sqlalchemy
import pandas as pd
...

column_errors.to_sql('load_errors',push_conn, 
                      if_exists = 'append', 
                      index = False, 
                      dtype={'datefld': sqlalchemy.DateTime(), 
                             'intfld':  sqlalchemy.types.INTEGER(),
                             'strfld': sqlalchemy.types.NVARCHAR(length=255)
                             'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True)
                             'booleanfld': sqlalchemy.types.Boolean})

You may even be able to dynamically create this dtype dictionary given you do not know column names or types beforehand:

def sqlcol(dfparam):    
    
    dtypedict = {}
    for i,j in zip(dfparam.columns, dfparam.dtypes):
        if "object" in str(j):
            dtypedict.update({i: sqlalchemy.types.NVARCHAR(length=255)})
                                 
        if "datetime" in str(j):
            dtypedict.update({i: sqlalchemy.types.DateTime()})

        if "float" in str(j):
            dtypedict.update({i: sqlalchemy.types.Float(precision=3, asdecimal=True)})

        if "int" in str(j):
            dtypedict.update({i: sqlalchemy.types.INT()})

    return dtypedict

outputdict = sqlcol(df)    
column_errors.to_sql('load_errors', 
                     push_conn, 
                     if_exists = 'append', 
                     index = False, 
                     dtype = outputdict)

Upvotes: 73

joris
joris

Reputation: 139162

You can create this dict dynamically if you do not know the column names in advance:

from sqlalchemy.types import NVARCHAR
df.to_sql(...., dtype={col_name: NVARCHAR for col_name in df})

Note that you have to pass the sqlalchemy type object itself (or an instance to specify parameters like NVARCHAR(length=10)) and not a string as in your example.

Upvotes: 50

Related Questions