Deepanshu
Deepanshu

Reputation: 132

Copy files from SFTP server to HDFS using Nifi

I'm trying to load huge data consisting of 225 GB (no. of file ~1,75,000) from SFTP server and copying data to HDFS.

To implement above scenario we've used 2 processors.

  1. GetSFTP (To get the files from SFTP server)

Configured Processor -> serach recursively = true ; use Natural Ordering = true ; Remote Poll Batch Size = 5000; concurrent tasks = 3

2.PutHDFS (Pushing the data to HDFS)

Configured Processor -> concurrent tasks = 3; Confict Resolution Strategy = replace ; Hadoop Configuration Resources; Directory

But after some time data copying is getting stopped and it's size is not updating in HDFS. When i set Remote Poll Batch Size in GetSFTP configure settings to 5000 -> total data pushed to HDFS is 6.4 GB, When set to 20000 -> total data pushed to HDFS is 25 GB

But I can't seem to figure out what I'm doing wrong.

Upvotes: 0

Views: 4316

Answers (1)

notNull
notNull

Reputation: 31470

Make sure you have scheduled GetSFTP processor to run based on Timer Drivern (or) Cron Driven.

Ideal solution will be Using ListSFTP + FetchSFTP processors instead of GetSFTP processor.

Refer this link for configuring/usage of List+Fetch sftp processors.

Upvotes: 3

Related Questions