moinkhan
moinkhan

Reputation: 11

How to ignore .snapshot folder in NiFi ListFile Processor

I am using ListFile & FetchFile in NiFi 1.27.0 with Java 11 to parse some log files. It worked really well in lower environments, but in production, instead of getting a list of files, I am seeing repeated entries like below in Bulletin Board.

15:01:35 GMT
ERROR
570f3c7f-2e25-1a30-0000-00000deb2bcb
All Nodes
ListFile[id=570f3c7f-2e25-1a30-0000-00000deb2bcb] Error during visiting file /path/to/source_logs/.snapshot/snapmirror.d969a104-f1c2-11e9-9d4a-00a098b1ce08_2152237315.2025-03-03_140500/.copy_offload: java.nio.file.NoSuchFileException: /path/to/source_logs/.snapshot/snapmirror.d969a104-f1c2-11e9-9d4a-00a098b1ce08_2152237315.2025-03-03_140500/.copy_offload

If I try to list .copy_offload in terminal, I get:

[me@somehost snapmirror.d969a104-f1c2-11e9-9d4a-00a098b1ce08_2152237315.2025-03-03_140500]$ ll -la
ls: cannot access .copy_offload: No such file or directory
total 324064
drwxr-xr-x 2064 root root   163840 Jul 25  2024 .
drwxrwxrwx    3 root root     4096 Mar  3 14:06 ..
----------    1 root root        0 Oct 15  2014 .bplusvtoc_internal
??????????    ? ?    ?           ?            ? .copy_offload

Here's what I have tried so far in ListFile:

  1. Ignore Hidden Files - True
  2. File Filter (various regular expression to get only the files I want)
  3. Path Filter - (a) Regex to exclude .snapshot and when it didn't work (b) Regex to include only the sub-folders I am interested in

By using ExecuteStreamCommand, I was able get a list of files, but considering the overhead to manage the state (which files have been picked up already) and performance, this does not seem to be a viable alternate for ListFile & FetchFile.

A couple of notes:

What else can I try?

Upvotes: 1

Views: 25

Answers (1)

cyberbrain
cyberbrain

Reputation: 5135

I would try a simpler and correct regex: ^(?!.*/\.snapshot/?).*$

The underscore would only match (or not match in your case) a literal underscore, and your path doesn't contain it exactly before the word "snapshot".

Also please don't use ll aliastogether with other parameters, just use ls -la (usually ll is an alias for ls -l) - it could be confusing. And you are not root on the system (I suppose), but the folder is owned by root (see the owner of folder . in your listing), so to read the folder completely, you should sudo or su for your ll or ls -la command.

Upvotes: 0

Related Questions