Reputation: 187
Is there an out-of-the-box method for Snowflake to use values from a column as a filename when using COPY INTO @mystage
? The goal is to copy X number of files into an s3 stage (essentially PARTITION BY column1), but straight to the stage, not creating subfolders. X would be the number of distinct values in a column.
This can obviously be done manually:
copy into @mystage/mycustomfilename
However, the better option would be something like this:
copy into @mystage/$column1
Is there a version of this that Snowflake supports?
As mentioned above, the PARTITION BY
setting parses data into subfolders and the subfolders are named using the values in the specified column, but Snowflake still uses a generic filename within each subfolder.
Upvotes: 0
Views: 920
Reputation: 2746
Created structure -
create temporary table temp_tab_split_members(seq_id number, member_id number, name varchar2(30));
+----------------------------------------------------+
| status |
|----------------------------------------------------|
| Table TEMP_TAB_SPLIT_MEMBERS successfully created. |
+----------------------------------------------------+
Fake data -
insert into temp_tab_split_members
with cte as
(select seq4(),(trim(mod(seq4(),4))+1)::integer,'my name-'||seq4() from table(generator(rowcount=>12)))
select * from cte;
+-------------------------+
| number of rows inserted |
|-------------------------|
| 12 |
+-------------------------+
Checking data format -
select * from TEMP_TAB_SPLIT_MEMBERS order by member_id;
+--------+-----------+------------+
| SEQ_ID | MEMBER_ID | NAME |
|--------+-----------+------------|
| 0 | 1 | my name-0 |
| 4 | 1 | my name-4 |
| 8 | 1 | my name-8 |
| 1 | 2 | my name-1 |
| 5 | 2 | my name-5 |
| 9 | 2 | my name-9 |
| 2 | 3 | my name-2 |
| 6 | 3 | my name-6 |
| 10 | 3 | my name-10 |
| 3 | 4 | my name-3 |
| 7 | 4 | my name-7 |
| 11 | 4 | my name-11 |
+--------+-----------+------------+
Checked stage is empty
list @test_row_stage;
+------+------+-----+---------------+
| name | size | md5 | last_modified |
|------+------+-----+---------------|
+------+------+-----+---------------+
EXECUTE IMMEDIATE $$
DECLARE
company varchar2(30);
BU varchar2(30);
eval_desc varchar2(30);
member_id varchar2(30);
file_name varchar2(30);
c1 CURSOR FOR SELECT distinct member_id FROM temp_tab_split_members;
BEGIN
for record in c1 do
member_id:=record.member_id;
file_name:='load'||'_'||member_id||'.csv';
execute immediate 'copy into @test_row_stage/'||:file_name||' from
(select * from temp_tab_split_members where member_id='||:member_id||') overwrite=false';
end for;
RETURN 0;
END;
$$
;
+-----------------+
| anonymous block |
|-----------------|
| 0 |
+-----------------+
Check stage contents after procedure execution
list @test_row_stage; -- output truncated columnwise
+----------------------------------------+------+
| name | size |
|----------------------------------------+------+
| test_row_stage/load_1.csv_0_0_0.csv.gz | 48 |
| test_row_stage/load_2.csv_0_0_0.csv.gz | 48 |
| test_row_stage/load_3.csv_0_0_0.csv.gz | 48 |
| test_row_stage/load_4.csv_0_0_0.csv.gz | 48 |
File contents cross-check
select $1,$2,$3 from @test_row_stage/load_1.csv_0_0_0.csv.gz union
select $1,$2,$3 from @test_row_stage/load_2.csv_0_0_0.csv.gz union
select $1,$2,$3 from @test_row_stage/load_3.csv_0_0_0.csv.gz union
select $1,$2,$3 from @test_row_stage/load_4.csv_0_0_0.csv.gz;
+----+----+------------+
| $1 | $2 | $3 |
|----+----+------------|
| 0 | 1 | my name-0 |
| 4 | 1 | my name-4 |
| 8 | 1 | my name-8 |
| 1 | 2 | my name-1 |
| 5 | 2 | my name-5 |
| 9 | 2 | my name-9 |
| 2 | 3 | my name-2 |
| 6 | 3 | my name-6 |
| 10 | 3 | my name-10 |
| 3 | 4 | my name-3 |
| 7 | 4 | my name-7 |
| 11 | 4 | my name-11 |
+----+----+------------+
Upvotes: 1
Reputation: 2746
There is no OOB for this as I understand, but you can write custom code and fetch values and use them to name files and copy them to stage/s3. Please refer below for something similar -
EXECUTE IMMEDIATE $$
DECLARE
company varchar2(30);
BU varchar2(30);
eval_desc varchar2(30);
member_id varchar2(30);
file_name varchar2(30);
c1 CURSOR FOR SELECT * FROM test_pivot;
BEGIN
for record in c1 do
company:=record.company;
BU:=record.BU;
eval_desc:=record.eval_desc;
member_id:=record.member_id;
file_name:='load'||'_'||member_id||'.csv';
create or replace temporary table temp_test_pvt(company varchar2(30),BU varchar2(30),eval_desc varchar2(30),member_id varchar2(30));
insert into temp_test_pvt values (:company,:bu,:eval_desc,:member_id);
execute immediate 'copy into @test_row_stage/'||:file_name||' from (select * from temp_test_pvt) overwrite=false';
end for;
RETURN 0;
END;
$$
;
Also, refer a similar post here - Copy JSON data from Snowflake into S3
Upvotes: 0