Snowflake pandas pd_writer writes out tables with NULLs

Question

I have a Pandas dataframe that I'm writing out to Snowflake using SQLAlchemy engine and the to_sql function. It works fine, but I have to use the chunksize option because of some Snowflake limit. This is also fine for smaller dataframes. However, some dataframes are 500k+ rows, and at a 15k records per chunk, it takes forever to complete writing to Snowflake.

I did some research and came across the pd_writer method provided by Snowflake, which apparently loads the dataframe much faster. My Python script does complete faster and I see it creates a table with all the right columns and the right row count, but every single column's value in every single row is NULL.

I thought it was a NaN to NULL issue and tried everything possible to replace the NaNs with None, and while it does the replacement within the dataframe, by the time it gets to the table, everything becomes NULL.

How can I use pd_writer to get these huge dataframes written properly into Snowflake? Are there any viable alternatives?

EDIT: Following Chris' answer, I decided to try with the official example. Here's my code and the result set:

import os
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
from snowflake.connector.pandas_tools import write_pandas, pd_writer


def create_db_engine(db_name, schema_name):
    return create_engine(
        URL(
            account=os.environ.get("DB_ACCOUNT"),
            user=os.environ.get("DB_USERNAME"),
            password=os.environ.get("DB_PASSWORD"),
            database=db_name,
            schema=schema_name,
            warehouse=os.environ.get("DB_WAREHOUSE"),
            role=os.environ.get("DB_ROLE"),
        )
    )


def create_table(out_df, table_name, idx=False):
    engine = create_db_engine("dummy_db", "dummy_schema")
    connection = engine.connect()

    try:
        out_df.to_sql(
            table_name, connection, if_exists="append", index=idx, method=pd_writer
        )

    except ConnectionError:
        print("Unable to connect to database!")

    finally:
        connection.close()
        engine.dispose()

    return True


df = pd.DataFrame([("Mark", 10), ("Luke", 20)], columns=["name", "balance"])

print(df.head)

create_table(df, "dummy_demo_table")

The code works fine with no hitches, but when I look at the table, which gets created, it's all NULLs. Again.

CodingInCircles · Accepted Answer

Turns out, the documentation (arguably, Snowflake's weakest point) is out of sync with reality. This is the real issue: https://github.com/snowflakedb/snowflake-connector-python/issues/329. All it needs is a single character in the column name to be upper case and it works perfectly.

My workaround is to simply do: df.columns = map(str.upper, df.columns) before invoking to_sql.

Snowflake pandas pd_writer writes out tables with NULLs

Answers (2)

Related Questions