Ingesting data into HDFS using Nifi - can't access files

Question

So, I'm using NiFi for the first time. I'm trying to use it to call an API and then to pipe the data into HDFS (Hortonworks Sandbox 2.4). I'm using just 2 processors currently: GetHTTP & PutHDFS.

I seem to have both processors configured ok... they run, but I can't then find the output file that's been created when I go into Hadoop through Ambari... I've set the output directory to be /user/, but nothing appears. However, I am getting a warning message on the PutHDFS processor advising:

WARNING PutHDFS[...] penalizing StandardFlowFileRecord[...] and routing to failure because file with same name already exists.

...so a file must be getting written somewhere. I've tried varying the API call specifying both xml and JSON formats, but makes no apparent difference.

I figure I must either need to add some processing to the pipeline in NiFi, or i'm looking in the wrong place in the sandbox. Can anyone advise please?

Jon295087 · Accepted Answer

Finally got this working. Built a dataflow comprising 4 processors:

getHTTP
evaluateXPath
2 x PutHDFS, one for 'matched', one for 'unmatched'

I think it was a case of correctly specifying the Auto Terminate Relationships (selecting both 'success' and 'failure' in the ).

Credit to http://nifi.rocks/getting-started-with-apache-nifi which provided the building blocks and thanks to others for comments.

Ingesting data into HDFS using Nifi - can't access files

Answers (2)

Related Questions

Ingesting data into HDFS using Nifi - can&#39;t access files

Answers (2)

Related Questions

Ingesting data into HDFS using Nifi - can't access files