Reputation: 316
Can we write data directly into snowflake table without using Snowflake internal stage using Python????
It seems auxiliary task to write in stage first and then transform it and then load it into table. can it done in one step only just like JDBC connection in RDBMS.
Upvotes: 5
Views: 30579
Reputation: 11449
Java:
Load Driver Class:
Class.forName("net.snowflake.client.jdbc.SnowflakeDriver")
Maven:
Add following code block as a dependency
<dependency>
<groupId>net.snowflake</groupId>
<artifactId>snowflake-jdbc</artifactId>
<version>{version}</version>
</dependency>
Spring:
application.yml:
spring:
datasource
hikari:
maximumPoolSize: 4 # Specify maximum pool size
minimumIdle: 1 # Specify minimum pool size
driver-class-name: com.snowflake.client.jdbc.SnowflakeDriver
Python:
import pyodbc
# pyodbc connection string
conn = pyodbc.connect("Driver={SnowflakeDSIIDriver}; Server=XXX.us-east-2.snowflakecomputing.com; Database=VAQUARKHAN_DB; schema=public; UID=username; PWD=password")
# Cursor
cus=conn.cursor()
# Execute SQL statement to get current datetime and store result in cursor
cus.execute("select current_date;")
# Display the content of cursor
row = cus.fetchone()
print(row)
Apache Spark:
<dependency>
<groupId>net.snowflake</groupId>
<artifactId>spark-snowflake_2.11</artifactId>
<version>2.5.9-spark_2.4</version>
</dependency>
Code
import org.apache.spark.sql.DataFrame
// Use secrets DBUtil to get Snowflake credentials.
val user = dbutils.secrets.get("data-warehouse", "<snowflake-user>")
val password = dbutils.secrets.get("data-warehouse", "<snowflake-password>")
val options = Map(
"sfUrl" -> "<snowflake-url>",
"sfUser" -> user,
"sfPassword" -> password,
"sfDatabase" -> "<snowflake-database>",
"sfSchema" -> "<snowflake-schema>",
"sfWarehouse" -> "<snowflake-cluster>"
)
// Generate a simple dataset containing five values and write the dataset to Snowflake.
spark.range(5).write
.format("snowflake")
.options(options)
.option("dbtable", "<snowflake-database>")
.save()
// Read the data written by the previous cell back.
val df: DataFrame = spark.read
.format("snowflake")
.options(options)
.option("dbtable", "<snowflake-database>")
.load()
display(df)
Fastest way to load data into Snowflake is from a file
Upvotes: 1
Reputation: 82
If someone is having issues with large datasets. Try Using dask instead and generate your dataframe partitioned into chunks. Then you can use dask.delayed with sqlalchemy. Here we are using snowflake's native connector method i.e. pd_writer, which under the hood uses write_pandas and eventually using PUT COPY with compressed parquet file. At the end it comes down to your I/O bandwidth to be honest. The more throughput you have, the faster it gets loaded in Snowflake Table. But this snippet provides decent amount of parallelism overall.
import functools
from dask.diagnostics import ProgressBar
from snowflake.connector.pandas_tools import pd_writer
import dask.dataframe as dd
df = dd.read_csv(csv_file_path, blocksize='64MB')
ddf_delayed = df.to_sql(
table_name.lower(),
uri=str(engine.url),
schema=schema_name,
if_exists=if_exists,
index=False,
method=functools.partial(
pd_writer,quote_identifiers=False),
compute=False,
parallel=True
)
with ProgressBar():
dask.compute(ddf_delayed, scheduler='threads', retries=3)
Upvotes: 2
Reputation: 7339
The absolute fastest way to load data into Snowflake is from a file on either internal or external stage. Period. All connectors have the ability to insert the data with standard insert commands, but this will not perform as well. That said, many of the Snowflake drivers are now transparently using PUT/COPY commands to load large data to Snowflake via internal stage. If this is what you are after, then you can leverage the pandas write_pandas
command to load data from a pandas dataframe to Snowflake in a single command. Behind the scenes, it will execute the PUT and COPY INTO for you.
I highly recommend this pattern over INSERT commands in any driver. And I would also recommend transforms be done AFTER loading to Snowflake, not before.
Upvotes: 18