Retaining source file name while importing data from s3 to Redshift

Question

I have large numbers of files within s3 bucket and usually import it to Redshift. Since number of files is large I need a column in Redshift table which should contain source file name from s3 location.

Is there any means to carried out problem ?

Rishi · Accepted Answer

Agree with Ketan that this is currently not possible in Redshift. If this is what you would want to achieve, it is possible through either

Reading the S3 files programmatically and write a new S3 files with file name as the column and load the new file
Alternatively, use Hive. Create external table on S3 file bucket location and use INPUT__FILE__NAME to get the file names, create a new table and then write back to S3. You can also do some pre-processing in Hive.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

Hope this helps.

Retaining source file name while importing data from s3 to Redshift

Answers (2)

Related Questions