Reputation: 59245
We were discussing with a customer if it would be faster to PUT files into Snowflake with SnowSQL or writing custom code with the JDBC driver.
One opinion was that JDBC would be faster than SnowSQL because Java is faster than Python - which is the language that SnowSQL is written in. But then not everyone agreed.
How can we tell which is faster?
Upvotes: 0
Views: 775
Reputation: 59245
We can write some minimal code to compare a PUT between Python and Java.
Let's start with the Java code:
import java.io.File;
import java.io.FileInputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import net.snowflake.client.jdbc.SnowflakeConnection;
public class App {
static String user = "";
static String password = "";
private static final String TMP_TEST_CSV = "/Users/fhoffa/Downloads/pageviews-20210601-000000.gz";
public static void main(String[] args) throws Exception {
Connection conn = DriverManager.getConnection(
"jdbc:snowflake://your-account.snowflakecomputing.com/?db=temp&role=sysadmin&schema=public", user, password);
File file = new File(TMP_TEST_CSV);
FileInputStream fileInputStream = new FileInputStream(file);
conn.unwrap(SnowflakeConnection.class).uploadStream("my_int_stage", "testUploadStream", fileInputStream, "destFile.csv", true);
}
}
It took 47 seconds to PUT this file with Java. I also changed the compression option to false to test if that would change much, and the whole process took 46 seconds then.
Meanwhile with SnowSQL I did this:
#(no warehouse)@TEMP.PUBLIC> put 'file:///Users/fhoffa/Downloads/pageviews-20210601-000000.gz' @~/my_int_stage;
pageviews-20210601-000000.gz(38.59MB): [##########] 100.00% Done (41.330s, 0.93MB/s).
+------------------------------+------------------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|------------------------------+------------------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| pageviews-20210601-000000.gz | pageviews-20210601-000000.gz | 40460014 | 40460014 | GZIP | GZIP | UPLOADED | |
+------------------------------+------------------------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 43.344s
It took 43 seconds, which is 10% less of the time that JDBC took. So there's no reason to think that Java will be much faster than Python.
Your results might vary! When sending files through a slow network most of the time will be spent dealing with the network, and running compression in Java might be faster.
The file I used for this example is a compressed CSV:
The basic setup in Snowflake to create the stage:
use role sysadmin;
use schema temp.public;
create stage my_int_stage;
Upvotes: 2