Ilya P
Ilya P

Reputation: 187

Use column values as filenames when COPYing into a stage

Is there an out-of-the-box method for Snowflake to use values from a column as a filename when using COPY INTO @mystage? The goal is to copy X number of files into an s3 stage (essentially PARTITION BY column1), but straight to the stage, not creating subfolders. X would be the number of distinct values in a column.

This can obviously be done manually:

copy into @mystage/mycustomfilename

However, the better option would be something like this:

copy into @mystage/$column1

Is there a version of this that Snowflake supports?

As mentioned above, the PARTITION BY setting parses data into subfolders and the subfolders are named using the values in the specified column, but Snowflake still uses a generic filename within each subfolder.

Upvotes: 0

Views: 920

Answers (2)

Pankaj
Pankaj

Reputation: 2746

Created structure -

create temporary table temp_tab_split_members(seq_id number, member_id number, name varchar2(30));
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Table TEMP_TAB_SPLIT_MEMBERS successfully created. |
+----------------------------------------------------+

Fake data -

insert into temp_tab_split_members
with cte as
(select seq4(),(trim(mod(seq4(),4))+1)::integer,'my name-'||seq4() from table(generator(rowcount=>12)))
select * from cte;
+-------------------------+
| number of rows inserted |
|-------------------------|
|                      12 |
+-------------------------+

Checking data format -

select * from TEMP_TAB_SPLIT_MEMBERS order by member_id;
+--------+-----------+------------+
| SEQ_ID | MEMBER_ID | NAME       |
|--------+-----------+------------|
|      0 |         1 | my name-0  |
|      4 |         1 | my name-4  |
|      8 |         1 | my name-8  |
|      1 |         2 | my name-1  |
|      5 |         2 | my name-5  |
|      9 |         2 | my name-9  |
|      2 |         3 | my name-2  |
|      6 |         3 | my name-6  |
|     10 |         3 | my name-10 |
|      3 |         4 | my name-3  |
|      7 |         4 | my name-7  |
|     11 |         4 | my name-11 |
+--------+-----------+------------+

Checked stage is empty

list @test_row_stage;
+------+------+-----+---------------+
| name | size | md5 | last_modified |
|------+------+-----+---------------|
+------+------+-----+---------------+

Main procedure to generate files

EXECUTE IMMEDIATE $$
DECLARE
  company varchar2(30);
  BU varchar2(30);
  eval_desc varchar2(30);
  member_id varchar2(30);
  file_name varchar2(30);
  c1 CURSOR FOR SELECT distinct member_id FROM temp_tab_split_members;
BEGIN
  for record in c1 do
  member_id:=record.member_id;
  file_name:='load'||'_'||member_id||'.csv';
  execute immediate 'copy into @test_row_stage/'||:file_name||' from 
  (select * from temp_tab_split_members where member_id='||:member_id||') overwrite=false';
  end for;
  RETURN 0;
END;
$$
;
+-----------------+
| anonymous block |
|-----------------|
|               0 |
+-----------------+

Check stage contents after procedure execution

list @test_row_stage; -- output truncated columnwise
+----------------------------------------+------+
| name                                   | size |
|----------------------------------------+------+
| test_row_stage/load_1.csv_0_0_0.csv.gz |   48 |
| test_row_stage/load_2.csv_0_0_0.csv.gz |   48 |
| test_row_stage/load_3.csv_0_0_0.csv.gz |   48 |
| test_row_stage/load_4.csv_0_0_0.csv.gz |   48 |

File contents cross-check

select $1,$2,$3 from @test_row_stage/load_1.csv_0_0_0.csv.gz union
select $1,$2,$3 from @test_row_stage/load_2.csv_0_0_0.csv.gz union
select $1,$2,$3 from @test_row_stage/load_3.csv_0_0_0.csv.gz union
select $1,$2,$3 from @test_row_stage/load_4.csv_0_0_0.csv.gz;
+----+----+------------+
| $1 | $2 | $3         |
|----+----+------------|
| 0  | 1  | my name-0  |
| 4  | 1  | my name-4  |
| 8  | 1  | my name-8  |
| 1  | 2  | my name-1  |
| 5  | 2  | my name-5  |
| 9  | 2  | my name-9  |
| 2  | 3  | my name-2  |
| 6  | 3  | my name-6  |
| 10 | 3  | my name-10 |
| 3  | 4  | my name-3  |
| 7  | 4  | my name-7  |
| 11 | 4  | my name-11 |
+----+----+------------+

Upvotes: 1

Pankaj
Pankaj

Reputation: 2746

There is no OOB for this as I understand, but you can write custom code and fetch values and use them to name files and copy them to stage/s3. Please refer below for something similar -

EXECUTE IMMEDIATE $$
DECLARE
  company varchar2(30);
  BU varchar2(30);
  eval_desc varchar2(30);
  member_id varchar2(30);
  file_name varchar2(30);
  c1 CURSOR FOR SELECT * FROM test_pivot;
BEGIN
  for record in c1 do
  company:=record.company;
  BU:=record.BU;
  eval_desc:=record.eval_desc;
  member_id:=record.member_id;
  file_name:='load'||'_'||member_id||'.csv';
  create or replace temporary table temp_test_pvt(company varchar2(30),BU varchar2(30),eval_desc varchar2(30),member_id varchar2(30));
  insert into temp_test_pvt values (:company,:bu,:eval_desc,:member_id);
  execute immediate 'copy into @test_row_stage/'||:file_name||' from (select * from temp_test_pvt) overwrite=false';
  end for;
  RETURN 0;
END;
$$
;

Also, refer a similar post here - Copy JSON data from Snowflake into S3

Upvotes: 0

Related Questions