Reputation: 4571
I want to use the step S3 CSV Input
to load multiple files from an s3 bucket then transform and load back into S3. But I can see this step support only one file at once and I need to supply the file names, is there any way to load all files at once by supplying only the bucket name i.e. <s3-bucket-name>/*
?
Upvotes: 1
Views: 5247
Reputation: 48287
Two options:
AWS CLI method
Write a simple shell script that calls AWS CLI. Put it in your path. Call it s3.sh
aws s3 ls s3://bucket.name/path | cut -c32-
In PDI:
Generate Rows: Limit
1, Fields: Name
: process, Type
: String, Value
s3.sh
Execute a Process: Process field
: process, Output Line Delimiter
|
Split Field to Rows: Field to split
: Result output. Delimiter
| New field name
: filename
S3 CSV Input: The filename field
: filename
S3 Local Sync
Mount the S3 directory to a local directory, using s3fs
If you have many large files in that bucket directory, it wouldn't work so fast...well it might be okay if your PDI runs on an Amazon machine
Then use the standard file reading tools
$ s3fs my-bucket.example.com/path/ ~/my-s3-files -o use_path_request_style -o url=https://s3.us-west-2.amazonaws.com
Upvotes: 1
Reputation: 1196
S3-CSV-Input
is inspired by CSV-Input
and doesn't support multi-file-processing like Text-File-Input
does, for example. You'll have to retrieve the filenames first, so you can loop over the filename list as you would do with CSV-Input
.
Upvotes: 3