Reputation: 595
I am trying using NIFI to detect duplicates based on 2 attributes of flow files such that per second there should not be any duplicate rows, whose 2 particular attribute values are same. In DetectDuplicate
processor, following are the entries of my processor:
CacheEntryIdentifier
: ${attribute1_name}::${attribute2_name}
Age of Duration
: 1 sec
Distributed Chache Service
: DistributedMapCacheClientService
Still, I am getting duplicate rows for which, per second values of these 2 attributes are same. Help is much appreciated. Thanks.
Upvotes: 1
Views: 3115
Reputation: 12093
An "Age Off Duration" of 1 second means that a CacheEntryIndentifier value that is a duplicate of one that arrived at least one second ago will NOT be considered a duplicate. That property is used to let entries "expire", some users set it for 24 hours so the next day, the same values can show up again as "not previously seen". If you want to always maintain the "seen" values, leave "Age Off Duration" blank.
Upvotes: 3