Skiel
Skiel

Reputation: 337

How to avoid flowfile failure records in NiFi?

I have a json file with almost 500.000 records and there are some of them that are bad parsed or have double-quote in wrong place. The problem is when I try to UpdateRecord appear a warning show me which are the failure records, but I can't load the any others valid records.

I am using this secuences of processors:

GetFile -> UpdatAttribute -> ConvertCharacterSet -> ->UpdateRecord ->PutParquet

GetFile -> Used to get the file

UpdateAttribute -> Update some attribute (nothing important)

ConvertCharacterSet -> ASCII to UTF-8, Because I have á é í ó ú ñ characters on the records.

UpdateRecord -> To mask one record

PutParquet -> To save the file in parquet.

I don't know how send the good records from UpdateRecord to PutParquet and the bad one to a errorlog.

Maybe I need another processor, but I try with validateRecord and didn't work (maybe was bad configured).

Example of failure in my records:


2020-10-06 01:47:23,471 ERROR org.apache.nifi.processors.standard.UpdateRecord: UpdateRecord[id=36473d38-5d59-1fae-82c1-5f46f50cbfab] Failed to process StandardFlowFileRecord[uuid=0152318c-d126-4c48-8b2e-3f41413724b8,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1601948827512-2676, container=default, section=628], offset=0, length=1587131043],offset=0,name=auditoria_20200929.txt.prq,size=1587131043]; will route to failure: org.apache.nifi.processor.exception.ProcessException: IOException thrown from UpdateRecord[id=36473d38-5d59-1fae-82c1-5f46f50cbfab]: org.codehaus.jackson.JsonParseException: Unexpected character ('B' (code 66)): was expecting double-quote to start field name
 at [Source: java.io.BufferedInputStream@35fab6ff; line: 179249, column: 393373]

2020-10-06 01:47:52,539 ERROR org.apache.nifi.processors.parquet.PutParquet: PutParquet[id=f7baa377-0174-1000-b6f3-ee3d6768eadd] Failed to write due to org.codehaus.jackson.JsonParseException: Unexpected character ('B' (code 66)): was expecting double-quote to start field name
 at [Source: java.io.BufferedInputStream@8295d5; line: 179249, column: 393373]: org.codehaus.jackson.JsonParseException: Unexpected character ('B' (code 66)): was expecting double-quote to start field name
 at [Source: java.io.BufferedInputStream@8295d5; line: 179249, column: 393373]
org.codehaus.jackson.JsonParseException: Unexpected character ('B' (code 66)): was expecting double-quote to start field name
 at [Source: java.io.BufferedInputStream@8295d5; line: 179249, column: 393373]

The idea is send all failure record to the guy who made them.

My UpdateRecord config:

UpdateRecord

enter image description here

Upvotes: 1

Views: 1119

Answers (2)

Mike Thomsen
Mike Thomsen

Reputation: 37526

One potential work around:

  1. Convert your JSON data to UTF-8 encoding at the point of creation.
  2. Add an argument in $NIFI_ROOT/conf/bootstrap.conf that adds -Dfile.encoding=UTF-8 to the early JVM arguments. It needs to be early in the argument order, like right after Xmx and Xms I think. This will load the JVM with default encoding set to UTF-8. I don't know it that's required on Linux, but the default charset on Windows is not UTF-8 AFAIK.

Upvotes: 0

Sdairs
Sdairs

Reputation: 2032

ValidateRecord would be the logical way to go, so perhaps you need to debug your ValidateRecord config some more.

Alternatively, you could introduce a SplitRecord processor to split each record into individual FlowFiles - then one conversion failure would not impact any other records, and you could route the failures where ever you want. However, this does introduce overhead and may impact overall performance of your flow.

Upvotes: 1

Related Questions