Vianney
Vianney

Reputation: 1

Cloud SQL / Cloud Function / Panda / Pytds : UnicodeDecodeError: 'charmap' codec can't decode byte

I'm trying to collect all data on an SQL Server bdd hosted in Cloud SQL with a Cloud Function.

I use pytds and pandas to collect the data from SQL Server and I'm using pandas_gbq.to_gbq to insert data into Google Big Query.

I've an issue with some tables with string columns.

The error gets raised when I try to do df = pd.read_sql(query, pool.connect())

Here is my code :

# initialize Connector object
    connector = Connector()
    def getconn() -> pytds.Connection:
        conn = connector.connect(
            instance_connection_name,
            "pytds",
            user=db_user,
            password=db_pass,
            db=db_name,
            bytes_to_unicode=True
            )
        return conn

    pool = sqlalchemy.create_engine(
        "mssql+pytds://",
        creator=getconn,
        pool_size    = 20,
        max_overflow = 0,
        pool_timeout = 600,
        pool_recycle = 1800,
        connect_args={"encoding": "latin1"}
        # ...
    )
    query = ('select * from article')
    df = pd.read_sql(query, pool.connect()).   # Issue with some table (like article for this example)

I have already try to specify the encoding of the database in the create_engine order but it's not sufficient.

Thanks

Upvotes: 0

Views: 154

Answers (1)

Jack Wotherspoon
Jack Wotherspoon

Reputation: 1979

I was able to get it working with the following package versions:

pip install cloud-sql-python-connector==1.4.2 SQLAlchemy==2.0.22 sqlalchemy-pytds==1.0.0 pandas==2.1.1

I made a few slight changes to the code because the query has to be sqlalchemy.text() format:

from google.cloud.sql.connector import Connector
import sqlalchemy

connector = Connector()

# function to return the database connection object
def getconn():
    conn = connector.connect(
        INSTANCE_CONNECTION_NAME,
        "pytds",
        user=DB_USER,
        password=DB_PASS,
        db=DB_NAME,
        # worked with and without bytes_to_unicode
    )
    return conn

# create connection pool with 'creator' argument to our connection object function
pool = sqlalchemy.create_engine(
    "mssql+pytds://",
    creator=getconn,
    # worked with or without encoding
)

query = sqlalchemy.text("SELECT * FROM article")
# connect to connection pool
with pool.connect() as db_conn:
  df = pd.read_sql(query, db_conn)
  print(df.head())

Upvotes: 0

Related Questions