Mohd Majid
Mohd Majid

Reputation: 1

I want to Store folder name while copying data from S3 bucket to Redshift table

I am trying to load data from S3 bucket to redshift table,there is one column as source id in the table and i want to store the folder name where the source file is available,in to that column.

Actually i have multiple folders in S3 bucket and in each folder i have one file and i port all the files in same table with copy command in redshift, so to identify from which folder the data is, so i need to store the folder name along with data into the Redshift table, i have seperate column in table as Source id.

can any body help me.

Upvotes: 0

Views: 1274

Answers (1)

Jon Scott
Jon Scott

Reputation: 4354

If you are using the Redshift copy command, then you have no choice other than a process to import each folder (e.g. as a temp table) and then set your value manually the the value of the folder that you restored. repeat for each folder.

Another option is to use redshift spectrum and create an external table that maps to your folder as partitions.

first you create your base table like this

create external table spectrum.sales_part(
salesid integer,
listid integer,
sellerid integer,
buyerid integer,
eventid integer,
dateid smallint,
qtysold smallint,
pricepaid decimal(8,2),
commission decimal(8,2),
saletime timestamp)
partitioned by (saledate date)
row format delimited
fields terminated by '|'
stored as textfile
location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/'
table properties ('numRows'='172000');

Then you add partitions to it like this

alter table spectrum.sales_part
add partition(saledate='2008-01-01') 
location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/';
alter table spectrum.sales_part
add partition(saledate='2008-02-01') 
location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/';
alter table spectrum.sales_part
add partition(saledate='2008-03-01') 
location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03/';

Once you have that set up as an external table, you can use standard sql against that table, for example you could run your queries against that table or copy it to a permanent redshift table using CTAS.

Here is a link to the documentation https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html

Upvotes: 2

Related Questions