HimanshuSPaul
HimanshuSPaul

Reputation: 316

Writing data into snowflake using Python

Can we write data directly into snowflake table without using Snowflake internal stage using Python????

It seems auxiliary task to write in stage first and then transform it and then load it into table. can it done in one step only just like JDBC connection in RDBMS.

Upvotes: 5

Views: 30579

Answers (3)

vaquar khan
vaquar khan

Reputation: 11449

Java:

Load Driver Class:

Class.forName("net.snowflake.client.jdbc.SnowflakeDriver")

Maven:

Add following code block as a dependency

<dependency>
  <groupId>net.snowflake</groupId>
  <artifactId>snowflake-jdbc</artifactId>
  <version>{version}</version>
</dependency>

Spring:

application.yml:

spring:
  datasource
    hikari:
      maximumPoolSize: 4 # Specify maximum pool size
      minimumIdle: 1 # Specify minimum pool size
      driver-class-name: com.snowflake.client.jdbc.SnowflakeDriver

Python:

import pyodbc

# pyodbc connection string
conn = pyodbc.connect("Driver={SnowflakeDSIIDriver}; Server=XXX.us-east-2.snowflakecomputing.com; Database=VAQUARKHAN_DB; schema=public; UID=username; PWD=password")

#  Cursor
cus=conn.cursor()

# Execute SQL statement to get current datetime and store result in cursor
cus.execute("select current_date;")

# Display the content of cursor
row = cus.fetchone()

print(row)

Apache Spark:

<dependency>
  <groupId>net.snowflake</groupId>
  <artifactId>spark-snowflake_2.11</artifactId>
  <version>2.5.9-spark_2.4</version>
</dependency>

Code

import org.apache.spark.sql.DataFrame

// Use secrets DBUtil to get Snowflake credentials.
val user = dbutils.secrets.get("data-warehouse", "<snowflake-user>")
val password = dbutils.secrets.get("data-warehouse", "<snowflake-password>")

val options = Map(
    "sfUrl" -> "<snowflake-url>",
    "sfUser" -> user,
    "sfPassword" -> password,
    "sfDatabase" -> "<snowflake-database>",
    "sfSchema" -> "<snowflake-schema>",
    "sfWarehouse" -> "<snowflake-cluster>"
)

// Generate a simple dataset containing five values and write the dataset to Snowflake.
spark.range(5).write
    .format("snowflake")
    .options(options)
    .option("dbtable", "<snowflake-database>")
    .save()

// Read the data written by the previous cell back.
val df: DataFrame = spark.read
    .format("snowflake")
    .options(options)
    .option("dbtable", "<snowflake-database>")
    .load()

display(df)

Fastest way to load data into Snowflake is from a file


Upvotes: 1

chetan_surwade
chetan_surwade

Reputation: 82

If someone is having issues with large datasets. Try Using dask instead and generate your dataframe partitioned into chunks. Then you can use dask.delayed with sqlalchemy. Here we are using snowflake's native connector method i.e. pd_writer, which under the hood uses write_pandas and eventually using PUT COPY with compressed parquet file. At the end it comes down to your I/O bandwidth to be honest. The more throughput you have, the faster it gets loaded in Snowflake Table. But this snippet provides decent amount of parallelism overall.

import functools
from dask.diagnostics import ProgressBar
from snowflake.connector.pandas_tools import pd_writer
import dask.dataframe as dd
df = dd.read_csv(csv_file_path, blocksize='64MB')
ddf_delayed = df.to_sql(
        table_name.lower(),
        uri=str(engine.url),
        schema=schema_name,
        if_exists=if_exists,
        index=False,
        method=functools.partial(
        pd_writer,quote_identifiers=False),
        compute=False,
        parallel=True
    )
with ProgressBar():
    dask.compute(ddf_delayed, scheduler='threads', retries=3)

Upvotes: 2

Mike Walton
Mike Walton

Reputation: 7339

The absolute fastest way to load data into Snowflake is from a file on either internal or external stage. Period. All connectors have the ability to insert the data with standard insert commands, but this will not perform as well. That said, many of the Snowflake drivers are now transparently using PUT/COPY commands to load large data to Snowflake via internal stage. If this is what you are after, then you can leverage the pandas write_pandas command to load data from a pandas dataframe to Snowflake in a single command. Behind the scenes, it will execute the PUT and COPY INTO for you.

https://docs.snowflake.com/en/user-guide/python-connector-api.html#label-python-connector-api-write-pandas

I highly recommend this pattern over INSERT commands in any driver. And I would also recommend transforms be done AFTER loading to Snowflake, not before.

Upvotes: 18

Related Questions