Pearl
Pearl

Reputation: 452

Pandas read_sql changing large number IDs when reading

I transferred an Oracle database to SQL Server and all seems to have went well. The various ID columns are large numbers so I had to use Decimal as they were too large for BigInt.

I am now trying to read the data using pandas.read_sql using pyodbc connection with ODBC Driver 17 for SQL Server. df = pandas.read_sql("SELECT * FROM table1"),con)

The numbers are coming out as float64 and when I try to print them our use them in SQL statements they come out in scientific notation and when I try to use '{:.0f}'.format(df.loc[i,'Id']) It turns several numbers into the same number such as 90300111000003078520832. It is like precision is lost when it goes to scientific notation.

I also tried pd.options.display.float_format = '{:.0f}'.format before the read_sql but this did not help.

Clearly I must be doing something wrong as the Ids in the database are correct.

Any help is appreciated Thanks

Upvotes: 4

Views: 3513

Answers (2)

Ian Kennedy
Ian Kennedy

Reputation: 21

I've had this problem too, especially working with long ids: read_sql works fine for the primary key, but not for other columns (like the retweeted_status_id from Twitter API calls). Setting coerce_float to false does nothing for me, so instead I cast retweeted_status_id to a character format in my sql query.

Using psql, I do:

df = pandas.read_sql("SELECT *, Id::text FROM table1"),con)

But in SQL server it'd be something like

df = pandas.read_sql("SELECT *, CONVERT(text, Id) FROM table1"),con)

or

df = pandas.read_sql("SELECT *, CAST(Id AS varchar) FROM table1"),con)

Obviously there's a cost here if you're asking to cast many rows, and a more efficient option might be to pull from SQL server without using pandas (as a nested list or JSON or something else) which will also preserve your long integer formats.

Upvotes: 2

Gord Thompson
Gord Thompson

Reputation: 123829

pandas' read_sql method has an option named coerce_float which defaults to True and it …

Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

However, in your case it is not useful, so simply specify coerce_float=False.

Upvotes: 5

Related Questions