Pramil Paudel
Pramil Paudel

Reputation: 33

Retaining source file name while importing data from s3 to Redshift

I have large numbers of files within s3 bucket and usually import it to Redshift. Since number of files is large I need a column in Redshift table which should contain source file name from s3 location.

Is there any means to carried out problem ?

Upvotes: 3

Views: 1842

Answers (2)

Rishi
Rishi

Reputation: 26

Agree with Ketan that this is currently not possible in Redshift. If this is what you would want to achieve, it is possible through either

  1. Reading the S3 files programmatically and write a new S3 files with file name as the column and load the new file
  2. Alternatively, use Hive. Create external table on S3 file bucket location and use INPUT__FILE__NAME to get the file names, create a new table and then write back to S3. You can also do some pre-processing in Hive.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

Hope this helps.

Upvotes: 1

ketan vijayvargiya
ketan vijayvargiya

Reputation: 5649

That isn't possible. During a Copy operation, Redshift only loads file contents into a table; it doesn't provide access to S3 file names.

To achieve what you want, you need to preprocess the data to add additional information inside the files.

Upvotes: 1

Related Questions