How to split Python dataframe type float64 column into multiple columns

Question

I need to run some calculations on some data pulled from a sales table using pyodbc. I am able to pull the data then I thought I would load it into a pandas dataframe. When the dataframe loads it has my data in one column when in reality it is 5 separate columns.

query = """SELECT OD.OrderNum, OD.Discount,OD.OrderQty,OD.UnitPrice, (a.OurReqQty - (a.OurJobShippedQty + a.OurStockShippedQty)) AS RemainingQty
        FROM PUB.OrderDtl AS OD
        INNER JOIN PUB.OrderRel AS a ON (OD.Company = a.Company) AND (OD.OrderNum = a.OrderNum) AND (OD.OrderLine = a.OrderLine)
        WHERE (a.OpenRelease = 1)"""
print (query)
cnxn = pyodbc.connect(connection_string)
cursor = cnxn.cursor()
cursor.execute(query)
ab = list(cursor.fetchall())
df = pd.DataFrame(ab, columns=["remain"])

which returns this.

[(115702, Decimal('0.00'), Decimal('25.00'), Decimal('145.00000'), Decimal('25.00')), 
(115793, Decimal('0.00'), Decimal('20.00'), Decimal('823.00000'), Decimal('20.00')),
(115793, Decimal('0.00'), Decimal('20.00'), Decimal('823.00000'), Decimal('20.00')), 
(116134, Decimal('0.00'), Decimal('10.00'), Decimal('587.00000'), Decimal('5.00')),
(116282, Decimal('0.00'), Decimal('1.00'), Decimal('699.95000'), Decimal('1.00'))]

When I load that into a dataframe it looks like this.

                          remain
0  [115702, 0.00, 25.00, 145.00000, 25.00]
1  [115793, 0.00, 20.00, 823.00000, 20.00]
2  [115793, 0.00, 20.00, 823.00000, 20.00]
3   [116134, 0.00, 10.00, 587.00000, 5.00]
4    [116282, 0.00, 1.00, 699.95000, 1.00]

I have tried to convert this to string by

df.index = df.index.map(str)
df_split = df["remain"].str.split(', ', 1)

But my split looks like

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN

I know this is a formatting issue or I assume it is but I don't know where to start. I figured it would be easiest to split if it was a string but maybe I am missing something.

thought this post would help but I think it requires me to export then reread the data back in.

I would greatly appreciate any help.

Gord Thompson · Accepted Answer

The behaviour you are seeing is due to the fact that .fetchall() in pyodbc does not return a list of tuples, it returns a list of pyodbc.Row objects.

You should be able to fill your DataFrame directly by using pandas' read_sql method:

query = """\
SELECT OD.OrderNum,
    OD.Discount,
    OD.OrderQty,
    OD.UnitPrice,
    (a.OurReqQty - (a.OurJobShippedQty + a.OurStockShippedQty)) AS RemainingQty
FROM PUB.OrderDtl AS OD
INNER JOIN PUB.OrderRel AS a ON (OD.Company = a.Company)
    AND (OD.OrderNum = a.OrderNum)
    AND (OD.OrderLine = a.OrderLine)
WHERE (a.OpenRelease = 1)
"""
cnxn = pyodbc.connect(connection_string)
df = pd.read_sql(query, cnxn)

How to split Python dataframe type float64 column into multiple columns

Answers (2)

Related Questions