Reputation: 741
So, I'm using NiFi for the first time. I'm trying to use it to call an API and then to pipe the data into HDFS (Hortonworks Sandbox 2.4). I'm using just 2 processors currently: GetHTTP & PutHDFS.
I seem to have both processors configured ok... they run, but I can't then find the output file that's been created when I go into Hadoop through Ambari... I've set the output directory to be /user/, but nothing appears. However, I am getting a warning message on the PutHDFS processor advising:
WARNING PutHDFS[...] penalizing StandardFlowFileRecord[...] and routing to failure because file with same name already exists.
...so a file must be getting written somewhere. I've tried varying the API call specifying both xml and JSON formats, but makes no apparent difference.
I figure I must either need to add some processing to the pipeline in NiFi, or i'm looking in the wrong place in the sandbox. Can anyone advise please?
Upvotes: 0
Views: 1810
Reputation: 741
Finally got this working. Built a dataflow comprising 4 processors:
I think it was a case of correctly specifying the Auto Terminate Relationships (selecting both 'success' and 'failure' in the ).
Credit to http://nifi.rocks/getting-started-with-apache-nifi which provided the building blocks and thanks to others for comments.
Upvotes: 0
Reputation: 1852
The PutHDFS processor reads the "filename" attribute on the incoming FlowFile and uses that as the filename in HDFS. As stated in the Documentation[1].
GetHTTP sets the "filename" attribute to be "..the name of the file on the remote server"[2]. So I'm guessing you're GetHTTP processor is getting the same file each time and thus the "filename" attribute will be the same on every FlowFile.
So in order to get around that error, you need an UpdateAttribute processor[3] which changes the "filename" attribute to a unique value.
Upvotes: 3