SheCodes
SheCodes

Reputation: 595

Detect duplicates on 2 attributes: nifi

I am trying using NIFI to detect duplicates based on 2 attributes of flow files such that per second there should not be any duplicate rows, whose 2 particular attribute values are same. In DetectDuplicate processor, following are the entries of my processor:

CacheEntryIdentifier : ${attribute1_name}::${attribute2_name}

Age of Duration : 1 sec

Distributed Chache Service : DistributedMapCacheClientService

Still, I am getting duplicate rows for which, per second values of these 2 attributes are same. Help is much appreciated. Thanks.

Upvotes: 1

Views: 3115

Answers (1)

mattyb
mattyb

Reputation: 12093

An "Age Off Duration" of 1 second means that a CacheEntryIndentifier value that is a duplicate of one that arrived at least one second ago will NOT be considered a duplicate. That property is used to let entries "expire", some users set it for 24 hours so the next day, the same values can show up again as "not previously seen". If you want to always maintain the "seen" values, leave "Age Off Duration" blank.

Upvotes: 3

Related Questions