Reputation: 1557

How to insert multiple rows of a pandas dataframe into Azure Synapse SQL DW using pyodbc?

I am using pyodbc to establish connection with Azure Synapse SQL DW. The connection is successfully established. However when it comes to inserting a pandas dataframe into the database, I am getting an error when I try inserting multiple rows as values. However, it works if I insert rows one by one. Inserting multiple rows together as values used to work fine with AWS Redshift and MS SQL, but fails with Azure Synapse SQL DW. I think the Azure Synapse SQL is T-SQL and not MS-SQL. Nonetheless, I am unable to find any relevant documentation as well.

I have a pandas df named 'df' that looks like this:

student_id  admission_date
    1          2019-12-12
    2          2018-12-08
    3          2018-06-30
    4          2017-05-30
    5          2020-03-11

This code below works fine

import pandas as pd
import pyodbc
#conn object below is the pyodbc 'connect' object

    batch_size = 1
    i = 0
    chunk = df[i:i+batch_size]
    conn.autocommit = True
    sql = 'insert INTO {} values {}'.format('myTable', ','.join(
        str(e) for e in zip(chunk.student_id.values, chunk.admission_date.values.astype(str))))
    print(sql)
    cursor = conn.cursor()
    cursor.execute(sql)

As you can see, it's inserting just 1 row of the 'df'. So, yes, I can loop through and insert one by one but it takes hell lot of time when it comes dataframes of larger sizes

This code below doesn't work when I try to insert all rows together import pandas as pd import pyodbc

batch_size = 5
i = 0
chunk = df[i:i+batch_size]
conn.autocommit = True
sql = 'insert INTO {} values {}'.format('myTable', ','.join(
        str(e) for e in zip(chunk.student_id.values, chunk.admission_date.values.astype(str))))
print(sql)
cursor = conn.cursor()
cursor.execute(sql)

The error I get this one below:

ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Parse error at line: 1, column: 74: Incorrect syntax near ','. (103010) (SQLExecDirectW)")

This is the sample SQL query for 2 rows which fails:

insert INTO myTable values (1, '2009-12-12'),(2, '2018-12-12')

Upvotes: 1

Answers (2)

Jai Janyani

Reputation: 61

COPY statement in Azure Synapse Analytics is a better way for loading your data in Synapse SQL Pool.

COPY INTO test_parquet
FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.parquet'
WITH (
    FILE_FORMAT = myFileFormat,
    CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
)

You can save your pandas dataframe into blob storage, and then trigger the copy command using execute method.

Upvotes: 0

Muppet Bear

Reputation: 152

That's because Azure Synapse SQL does not support multi-row insert via the values constructor. One work around is to chain "select (value list) union all". Your pseudo SQL should look like so:

insert INTO {table}
select {chunk.student_id.values}, {chunk.admission_date.values.astype(str)} union all
...
select {chunk.student_id.values}, {chunk.admission_date.values.astype(str)}

Upvotes: 1

How to insert multiple rows of a pandas dataframe into Azure Synapse SQL DW using pyodbc?

Answers (2)

Related Questions