Krakenudo
Krakenudo

Reputation: 309

How can I use NiFi processor RouteOnContent

I'm trying to read a log file like that one:

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985
199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085
burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0
199.120.110.21 - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0" 200 4179

I'm sending 1000 lines each time I run this exercise, and I'm using a splitText processor, and in the extractText processor I use this regex:

successCode -> ^[0-9A-Z\-a-z\.]* - - \[[0-9A-Za-z\/\:]* -[0-9]*\] \"[A-Z]* [0-9A-Za-z\/\.\- ]*\" ([0-9]*) [0-9]*
tiemStamp -> ^[0-9A-Z\-a-z\.]* - - \[([0-9A-Za-z\/\:]*) -[0-9]*\] \"[A-Z]* [0-9A-Za-z\/\.\- ]*\" [0-9]* [0-9]*
important -> ^([0-9A-Z\-a-z\.]*) - - \[[0-9A-Za-z\/\:]* -[0-9]*\] \"[A-Z]* [0-9A-Za-z\/\.\- ]*\" [0-9]* [0-9]*

It can be a mistake on it. Surely here is my problem.

Then, I tryed to send different logs to different routes. If successCode == 200 then I tried to put it on route /user//success/%{tiemStamp}/, but all my lines go to the third way: "unmatched"

On the RouteOnContent processor I've tryed:

successCode -> ${successCode:equals("200")}
successCode -> ${successCode:contains(2)}
successCode -> ${successCode:contains("2")}

Has anyone worked with "RouteOnContent" processor?

Upvotes: 2

Views: 10931

Answers (2)

Lior Kirshner
Lior Kirshner

Reputation: 706

Basically you can use both RouteOnAttribute or RouteOnText, but each uses different parameters.

If you chose to use ExtractText, the properties you defined are populated for each row (after the original file was split by SplitText processor). Now, you have two options:

  1. Route based on the attributes that have been extracted (RouteOnAttribute).
  2. Route based on the content (RouteOnContent). In this case, you don't really need to use Extract Text.

Each processor routes the FlowFile differently:

  1. RouteOnAttribute queries the attributes of the FlowFile (a NiFi Expression Language query). For example, let's say I defined the property 'name', routing based on its value can be: enter image description here

  2. On the other hand, RouteOnContext queries the content of the FlowFile based on a regex expression. For example: enter image description here

After defining these parameters, you can continue to route based on these dynamic relationships: enter image description here

Upvotes: 2

Val Bonn
Val Bonn

Reputation: 1199

According to the documentation, the ExtractText Processor "Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes [...]"

So you should not use a RouteOnContent but a RouteOnAttribute processor in the next step.

(If you stop your RouteOnXXX processor in order to keep the messages in the queue, you can see the content of the flowfiles. On the "Attributes" tab of a flowfile, you can see the values of the different attributes. And I confirm that with your regexp, I have successCode=200. )

Upvotes: 3

Related Questions