vk1011
vk1011

Reputation: 7179

Unsigned int for dataframe to_sql using sqlalchemy types

I am unable to assign an unsigned int type when using .to_sql() to write my dataframe to a MySQL database. I can use the other int types, but just am unable to get unsigned. A small representative sample of what I am trying looks like this:

import pandas as pd
from sqlalchemy import create_engine
import sqlalchemy.types as sql_types

db_engine = create_engine('mysql://db_user:db_pass@db_host:db_port/db_schema')

d = {'id': [100,101,102], 'items': [6,10,20000], 'problems': [50,72,2147483649]} # Representative sample dictionary
df = pd.DataFrame(d).set_index('id')

This gives:

>>> df
         items    problems
id
100          6          50
101         10          72
102      20000  2147483649

I write to the database as follows:

df.to_sql('my_table',
          db_engine,
          flavor='mysql',
          if_exists='replace',
          index_label=['id'],
          dtype={'id': sql_types.SMALLINT,
                 'items': sql_types.INT,
                 'problems': sql_types.INT}

But what happens is the value of problems in the last row (id==102) gets truncated to 2147483647 (which is 2^31-1) when written to the db.

There are no other issues in the connection or when writing other standard data types, including int. I could get away by using the sql_types.BIGINT option instead (making the maximum 2^63-1), but that would really just be unnecessary as I know my values would fall below 4294967296 (2^32-1), which is basically the unsigned int maximum.

So question is: How can I assign an unsigned int field using the .to_sql() approach above?

I have used the sqlalchemy types from here. The MySQL types I see are here. I have seen the question here which does get the unsigned int for MySQL, but it is not using the .to_sql() approach I would like to use. If I can simply create the table from the single .to_sql() statement, that would be ideal.

Upvotes: 4

Views: 2135

Answers (1)

joris
joris

Reputation: 139162

To get an unsigned int, you can specify this in the sqlalchemy constructor of the INTEGER type for mysql (see the docs on the mysql types of sqlalchemy):

In [23]: from sqlalchemy.dialects import mysql

In [24]: mysql.INTEGER(unsigned=True)
Out[24]: INTEGER(unsigned=True)

So you can provide this to the dtype argument in to_sql instead of the more general sql_types.INT:

dtype={'problems': mysql.INTEGER(unsigned=True), ...}

Note: you need at least pandas 0.16.0 to have this working.

Upvotes: 2

Related Questions