RPalmer
RPalmer

Reputation: 3

Issue with inserting data into Sql server using Python Pandas Dataframe

I am trying to pull data from a REST API and insert it into SQL Server. If we have the script do the PhotoBinary,Filetype together it works but as soon as I add the ID which is an integer we get the error below. Also if I just have it pull ID on its own from the API it works.

I am trying to pull 3 pieces of information

  1. The EmployeeID which is an int.
  2. The Binary String representation of the image
  3. The file type of the original file e.g.: .jpg

The target table is setup as:

Create table Employee_Photo
( 
    EmployeeID  int,
    PhotoBinary varchar(max),
    FileType varchar(10)
)

The Error I get is:

Traceback (most recent call last):
  File "apiphotopullwithid.py", line 64, in <module>
    cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType'])
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 5 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')
import json
import pandas as pd
import sqlalchemy
import pyodbc
import requests

url = "https://someurl.com/api/PersonPhoto"

headers = {
    'Accept': "application/json",
    'Authorization': "apikey XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    'Content-Type': "application/json",
    'cache-control': "no-cache"
}

response = requests.request("GET", url, headers=headers)
data = json.loads(response.text)


ID,Photo,PhotoType = [],[],[]

for device in data['PersonPhoto']:
    ID.append(device[u'ID'])

    Photo.append(device[u'Photo'])

    PhotoType.append(device[u'PhotoType'])


df = pd.DataFrame([ID,Photo,PhotoType]).T
df.columns = ['EMPID','Photo','PhotoType']
df = df.astype({'EMPID':'Int64'})



connStr = pyodbc.connect(
    "DRIVER={SQL Server};"
    "SERVER=SQLTest;"
    "Database=Intranet123;"
    "Trusted_Connection=yes;"
    #"UID=ConnectME;"
    #"PWD={Password1}"
)
cursor = connStr.cursor()

for index,row in df.iterrows():
cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", row['EMPID'],row['Photo'],row['PhotoType']) 
    connStr.commit()
    cursor.close()
connStr.close()

Upvotes: 0

Views: 4556

Answers (2)

Parfait
Parfait

Reputation: 107567

In most Python database APIs including pyodbc adhering to the PEP 249 specs, the parameters argument in cursor.execute() is usually a sequence (i.e., tuple, list). Therefore, bind all values into an iterable and not as three separate argument values:

sql = "INSERT INTO dbo.Employee_Photo ([EmployeeID],[PhotoBinary],[FileType]) VALUES (?,?,?)"

# TUPLE
cursor.execute(sql, (row['EMPID'], row['Photo'], row['PhotoType']))

# LIST
cursor.execute(sql, [row['EMPID'], row['Photo'], row['PhotoType']])

By the way, avoid the explicit iterrows loop and use implicit loop with executemany using Pandas' DataFrame.values:

# EXECUTE PARAMETERIZED QUERY
sql_cols = ['EMPID', 'Photo', 'PhotoType']
cursor.executemany(sql, df[sql_cols].values.tolist())   
conn.commit()

Actually, you do not even need Pandas as a middle layer (use library for just data science) and interact with original returned json:

# NESTED LIST OF TUPLES
vals = [(int(device[u'ID']), device[u'Photo'], device[u'PhotoType']) \
           for device in data['PersonPhoto']]

cursor.executemany(sql, vals)   
conn.commit()

Upvotes: 2

David Browne - Microsoft
David Browne - Microsoft

Reputation: 88851

You're using the old Windows built-in SQL Server driver. Try the newer one, which you can get from here for multiple platforms.

Don't read too much into the error message. Something is malformed in the network protocol layer.

Can you dump the types and values of the parameters causing the issue. My guess is that the driver is setting the parameter types incorrectly.

EG:

for index,row in df.iterrows():
  empid =  row['EMPID']
  photo = row['Photo']
  photoType = row['PhotoType']

  print("empid is ",type(empid), " photo is ", type(photo), " photoType is ", type(photoType))
  print("empid: ",empid, " photo: ", photo, " photoType: ", photoType)

  cursor.execute("INSERT INTO dbo.Employee_Photo([EmployeeID],[PhotoBinary],[FileType]) values (?,?,?)", empid,photo,photoType) 

connStr.commit()
cursor.close()
connStr.close()

Upvotes: 0

Related Questions