mkp24
mkp24

Reputation: 125

Snowflake|Airflow|Google Cloud : How to unload data from snowflake tables in csv format and upload to gcs using airflow

Is there any way to unload data from Snowflake to csv format or can it be directly stored in csv format in google cloud storage?

We are using composer(airflow) dags to connect to snowflake and unload data from tables into csv files and store them in google cloud storage and later on migrating it further.

What I have tried :

Querying data from snowflake table and getting it in a variable.

What I want to do further:

To convert the data into csv file(as have not run the code yet) and to migrate it to GCS bucket but seems like there is only GCStoGCSoperator in airflow which cannot help in this.

What I am thinking:

  1. If I should use python file with scheduler instead of writing in DAG.
  2. Doing it through dataflow(beam) and running it on composer.

Code:-

def func(**context):
    dwh_hook = SnowflakeHook(snowflake_conn_id="snowflake_conn")
    result = dwh_hook.get_first("select col1,col2,col3,col4,col5 from table_name where col_name = previous_date_func_here")
    # print(result)

I have not yet tested it as I want to test it with GCS but seems like its not gonna work. What could be the ways? Is it actually even possible with airflow to do this ?

Upvotes: 0

Views: 948

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 175736

Snowflake supports data unloading using COPY INTO location command:

Unloads data from a table (or query) into one or more files in one of the following locations:

  • Named internal stage (or table/user stage). The files can then be downloaded from the stage/location using the GET command.
  • Named external stage that references an external location (Amazon S3, Google Cloud Storage, or Microsoft Azure).
  • External location (Amazon S3, Google Cloud Storage, or Microsoft Azure).

Format Type Options (formatTypeOptions)

  • TYPE = CSV

  • TYPE = JSON

  • TYPE = PARQUET


Unloading Data from a Table Directly to Files in an External Location

Google Cloud Storage

Access the referenced GCS bucket using a referenced storage integration named myint:

COPY INTO 'gcs://mybucket/unload/'
FROM mytable
STORAGE_INTEGRATION = myint
FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Related: Configuring an Integration for Google Cloud Storage

Upvotes: 0

Related Questions