Sammy
Sammy

Reputation: 161

In NiFi is it possible to read selectively through FetchS3Object processor?

In Apache NiFi, using FetchS3Object to read from an S3 bucket, I see it can reads all the object in bucket and as they are added. Is it possible:

  1. To configure the processor to read only objects added now onwards, not the one already present?
  2. How can I make it read a particular folder in the bucket?

NiFi seems great, just missing examples in their documentation for atleast the popular processors.

Upvotes: 7

Views: 9209

Answers (3)

khushbu kanojia
khushbu kanojia

Reputation: 250

Use GetSQS and fetchS3Object processor and configure your GETSQS processor to listen for notification for newly added file. It's a event driven approach as whenever a new file comes SQS queue sends notification to nifi. Use below link to get full clarifications: AWS-NIFI integration

Upvotes: 2

James
James

Reputation: 11931

Another approach would be to configure your S3 bucket to send SNS notifications, subscribe an SQS queue. NiFi would read from the SQS queue to receive the notifications, filter objects of interest, and process them.

See Monitoring An S3 Bucket in Apache NiFi for more on this approach.

Upvotes: 2

James
James

Reputation: 11931

A combination of ListS3 and FetchS3Object processors will do this:

  1. ListS3 - to enumerate your S3 bucket and generate flowfiles referencing each object. You can configure the Prefix property to specify a particular folder in the bucket to enumerate only a subset. ListS3 keeps track of what it has read using NiFi's state feature, so it will generate new flowfiles as new objects are added to the bucket.
  2. FetchS3Object - to read S3 objects into flowfile content. You can use the output of ListS3 by configuring the FetchS3Object's Bucket property to ${s3.bucket} and Object Key property to ${filename}.

enter image description here

Upvotes: 9

Related Questions