JacobocaJ
JacobocaJ

Reputation: 51

Python Turbodbc executemanycolumns error: Unable to cast Python instance to C++ type (compile in debug mode for details)

When attempting to use executemanycolumns with SQL Server, I get this error: "Unable to cast Python instance to C++ type (compile in debug mode for details) "

I know that Turbodbc does not like numpy nulls (NaN or NaT) so to remove these I use:

df= df.where((pd.notnull(df)), None)

After running this however, I then get an error saying a column is not contiguous. These are always columns of int64 datatype. To counteract this, I check each array using arr.flags['C_CONTIGUOUS'] and use np.ascontiguousarray(arr) if False. I then receive the "Unable to cast Python instance to C++ type (compile in debug mode for details) " error again.

My dataframe contains no NaNs or NaTs. Any idea as to what is causing this error? Thanks for any help!

Upvotes: 5

Views: 10078

Answers (3)

Yvo Putter
Yvo Putter

Reputation: 1

First of all, thanks to erickfis for his post on medium!

After reading turbodbc docs, I've changed his df_valores logic by using numpy's MaskedArrays into:

import numpy as np
import pandas as pd
valores_df = [np.ma.MaskedArray(df[col].values, pd.isnull(df[col].values)) for col in df.columns]

This way the NaNs in each df.col are correctly converted to sql NULLs during

cursor.executemanycolumns(sql, valores_df)

For me, this small change solved the errors I was getting:

  • Unable to cast Python instance to C++ type" 
  • ODBC error state: 42000 native error code: 8023 message: [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 34 (""): The supplied value is not a valid instance of data type float.

Upvotes: 0

erickfis
erickfis

Reputation: 1234

I'm the author of

https://medium.com/@erickfis/etl-process-with-turbodbc-1d19ed71510e

Well, in my experience, the problem is always one of:

  • missing values
  • bad column names (encoding, spaces or forbidden names)
  • bad string encoding
  • bad date encoding

I've tried to run my routine yesterday with MySQL this time and I was getting the error "Unable to cast Python instance to C++ type".

What I did to fix it:

  • fix (normalize - utf8 encoding) column names
  • run fillna for each dtype in my df
  • normalize text (utf8 encoding)
  • extract column values
  • run my turbodbc routine again
  • voila, it works

Yesterday I was extracting the column values before cleaning the date, therefore I was getting the error.

Thus, after carefully cleaning and preparing your data, for each dtype, you should be fine with:

values = [df[col].values for col in df.columns]

colunas = '('
colunas += ', '.join(df.columns)
colunas += ')'

val_place_holder = ['?' for col in df.columns]
sql_val = '('
sql_val += ', '.join(val_place_holder)
sql_val += ')'

sql = f"""
INSERT INTO {mydb}.{table} {colunas}
VALUES {sql_val}
"""


with turbo_connection.cursor() as cursor:
    try:
        cursor.executemanycolumns(sql, values)
        turbo_connection.commit()
    except Exception as e:
        turbo_connection.rollback()
        raise(e)

Upvotes: 2

Hillygoose
Hillygoose

Reputation: 195

So this answer is definitely not the full explanation of why this error occurs but is based on what I have found when combating this issue myself.

First off I need to reference this great article which gave me the function for building the export to SQL https://medium.com/@erickfis/etl-process-with-turbodbc-1d19ed71510e

At the end of this article there are some responses and questions, some of which relate to your question. Within that is a response that says they wrote their dataframe to csv, then read that csv, then ran the Turbodbc code.

I am unsure why this may remove the error that you are seeing - and to avoid breaking SO rules on answers, I am not going to put down my guess of what is going on because it will probably be unhelpful!

I decided to follow this advice since I was at my wits end and thankfully it worked and exported my whole dataframe (20975 rows and 100 columns) in around 17 seconds!!!

Some other things I did whilst seeing other errors whilst using Turbodbc are the below (may not be directly relate to this specific issue but may help other issues you get further down the line):

  - update numpy to most recent version
  - install dlib from whl file here https://pypi.org/simple/dlib/
  - install CMake using pip install 

I hope this helps and that someone else can provide you a more thorough explanation of what is going on!

Upvotes: 1

Related Questions