Srikant
Srikant

Reputation: 437

Loading redshift table from multiple s3 folder using manifest

I am using copy command to load a Redshift table from s3 using manifest.

The requirement is to load multiple files ( across various folders ) for e.g

Path1 : s3://bucket_name/folder_name/folder_1/folder/part*.parquet
Path2 : s3://bucket_name/folder_name/folder_2/folder/part*.parquet
Path3 : s3://bucket_name/folder_name/folder_3/folder/part*.parquet

each path will have ~1000 files

How do I create a manifest to load this ?

I created a manifest as follows :

{
    "fileLocations": [ 
{"url":"s3://bucket_name/folder_name/folder_1/folder/part*.parquet", "mandatory":false},

 {"url":"s3://bucket_name/folder_name/folder_3/folder/part*.parquet", "mandatory":false},

 {"url":"s3://bucket_name/folder_name/folder_2/folder/part*.parquet", "mandatory":false},

 ]
}

but I am getting an error:

Manifest does not contain a list of files.

Upvotes: 1

Views: 1305

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269410

From Using a manifest to specify data files - Amazon Redshift:

The following example shows the JSON to load files from different buckets and with file names that begin with date stamps:

{
  "entries": [
    {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
    {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
    {"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
    {"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
  ]
}

The problem is probably your use of fileLocations vs entries.

I also suspect that the use of wildcards is not permitted.

Upvotes: 2

Related Questions