Reputation: 438
I have just working in a cloud database project involving some big data processing and need to quickly migrate data directly from bigquery to snowflake. I was thinking if there is any direct way to push data from bigquery to snowflake without any intermediate storage.
Please let me know there is a utility or api or anything other way bigquery provides to push data to snowflake. Appreciate your help.
Upvotes: 2
Views: 8797
Reputation: 1
I followed the recommendations from the best answer and also used a YouTube video that helped me set up the connection between Google Cloud Storage and Snowflake. (www.youtube.com/watch?v=4SBNUNMotRY) The most important point is creating the integration:
CREATE OR REPLACE STORAGE INTEGRATION gcs_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'GCS'
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ('gcs://bucket');
GRANT USAGE ON INTEGRATION gcs_int TO ROLE name;
After establishing the connection, you need to create a table into which the data will be loaded, and then simply copy it:
COPY INTO table_name (column_names)
FROM 'gcs://bucket'
STORAGE_INTEGRATION = gcs_int
FILE_FORMAT = (type = 'CSV' skip_header = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS UTC');
This is how I implemented the recommendations and configured the connection between GCS and Snowflake.
Upvotes: 0
Reputation: 121
If the Snowflake instance is hosted on a different vendor (e.g. AWS or Azure), there is a tool that I wrote that could be of use called sling
. There is a blog entry about this in detail, but essentially you run one command after setting up your credentials:
$ sling run --src-conn BIGQUERY --src-stream segment.team_activity --tgt-conn SNOWFLAKE --tgt-object public.activity_teams --mode full-refresh
11:37AM INF connecting to source database (bigquery)
11:37AM INF connecting to target database (snowflake)
11:37AM INF reading from source database
11:37AM INF writing to target database [mode: full-refresh]
11:37AM INF streaming data
11:37AM INF dropped table public.activity_teams
11:38AM INF created table public.activity_teams
11:38AM INF inserted 77668 rows
11:38AM INF execution succeeded
Upvotes: 0
Reputation: 5526
If you want to move all your data to snowflake from bigquery, so I have few assumption 1. You have all your data in gcs itself. 2. You can connect the snowflake cluster from gcp.
Now your problem shrinks down to moving your data from gcs bucket to snowflake directly. SO now you can run copy command like
COPY INTO [<namespace>.]<table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]
where external location is
externalLocation ::=
'gcs://<bucket>[/<path>]'
[ STORAGE_INTEGRATION = <integration_name> ]
[ ENCRYPTION = ( [ TYPE = 'GCS_SSE_KMS' ] [ KMS_KEY_ID = '<string>' ] | [ TYPE = NONE ] ) ]
The documentation for the same can be found here and here
Upvotes: 2
Reputation:
If this is a one-time need, you can perform it in two steps, exporting the table data to Google's Cloud Storage and then bulk loading from the Cloud Storage onto Snowflake.
For the export stage, use BigQuery's inbuilt export functionality or the BigQuery Storage API if the data is very large. All of the export formats supported by BigQuery (CSV, JSON or AVRO) are readily supported by Snowflake, so a data transformation step may not be required.
Once the export's ready on the target cloud storage address, use Snowflake's COPY INTO <table>
with the external location option (or named external stages) to copy them into a Snowflake-managed table.
Upvotes: 4