mortysporty
mortysporty

Reputation: 2889

pandas read sql db2 corrupts decimal

I am trying to read a datatable from using db2. It seems, however, that decimals are simply ignored and floats are somehow multiplied by 100. For instance 100.50 becomes 10050.0 when read into a pandas dataframe.

BTW I am from Norway, so decimals are denoted by a , rather than a .. I dont know if this matters.

I use the following SQL (in WinSQL Lite):

SELECT CUSTOMER_ID, FUND_NAME, SAVINGS_AMOUNT
FROM SAVINGS_TABLE
WHERE CUSTOMER_ID = 1

, which gives:

CUSTOMER_ID, FUND_NAME, SAVINGS_AMOUNT
          1    ASSET_X       259131,72
          1    ASSET_Y       718533,33   

I know this is correct, so the issue is not the data.

In Spyder, using Python, I have

import pandas as pd
import pyodbc as po

DSN = 'MY_DSN'
UID = 'MY_USER'
PWD = 'MY_PASSWORD'
CON = po.connect('DSN={0}; UID={1}; PWD={2}'.format(DSN, UID, PWD))

SQL = """SELECT CUSTOMER_ID, FUND_NAME, SAVINGS_AMOUNT
FROM SAVINGS_TABLE
WHERE CUSTOMER_ID = 1"""

df = pd.read_sql(SQL, CON)

df
Out[16]: 
    CUSTOMER_ID    FUND_NAME    SAVINGS_AMOUNT
0             1      ASSET_X        25913172.0
1             1      ASSET_Y        71853333.0

What is happening here, and how can I fix it?

The column SAVINGS_AMOUNT is DECIMAL with Size=9 and Scale=2. Surely the issue is somehow related to this. I guess I have to set a parameter in pd.read_sql to interpret the column.

I know I can simply divide by 100 to correct this. But I dont want to do that. I want to read the correct numbers.

Also I'm using Windows.

Upvotes: 14

Views: 3858

Answers (6)

quiquee
quiquee

Reputation: 1

Using SELECT 1*SAVINGS_AMOUNT provides the correct result.

Upvotes: -1

Paul Lilley
Paul Lilley

Reputation: 46

You may find it helps to explicitly set the decimal separator to one used in Norway

pyodbc.setDecimalSeparator(',')

Upvotes: 3

TMarks
TMarks

Reputation: 564

Have you tried setting pd.read_sql(SQL, CON, coerce_float=false)? This maybe be an issue of attempting to change decimal to float.

See https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_sql.html

EDIT:

If that does not work, you can try casting the int to a string in SQL, and then once in Python, replace the comma with a decimal and cast to float. A rather resource intensive answer, but it gets the job done.

Upvotes: 3

Silenced Temporarily
Silenced Temporarily

Reputation: 1004

In read_csv you can specify the decimal separator, but you cannot do this in read_sql. Your locale may not be set correctly. Before read_sql, try this:

import locale
locale.set_locale(locale.LC_NUMERIC, "no_NO")

Alternatively, if that isn't working, you could try casting the problematic column as a string in your SQL query, and then doing a .replace to convert the comma to a period and then finally converting it back to a number. This doesn't require you to know the precision beforehand.

Upvotes: 3

M Danish
M Danish

Reputation: 478

Please try this. It may work.

import numpy as np
np.round(p_table, decimals=2)

Upvotes: 3

ccarpenter32
ccarpenter32

Reputation: 1077

Not sure if you've tried forcing the SQL input:

SQL = """SELECT CUSTOMER_ID, FUND_NAME, Cast(SAVINGS_AMOUNT AS Decimal(20,2))
FROM SAVINGS_TABLE
WHERE CUSTOMER_ID = 1"""

Upvotes: 4

Related Questions