Reputation: 362
We are getting new files everyday from apps in the form of csv gets stored in windows server say c:/program files(x86)/webapps/apachetomcat/.csv each file having different data in it, So is there any hadoop component to transfer files from windows server to hadoop hdfs, I came across flume,kafka but not getting proper example, Can anyone shade light here.
So Each file have separate name and having size upto 10-20mb and the daily file count is more than 200 files, Once the files added to windows server the flume/kafka should able to put that files in hadoop, Later files are imported from HDFS processed by spark and moved to processed files to another folder in HDFS
Upvotes: 0
Views: 71
Reputation: 97
Flume is the best choice. A flume agent (process) needs to be configured. A flume agent has 3 parts:
Flume source - Place where flume will look for new files. c:/program files(x86)/webapps/apachetomcat/.csv in your case.
Flume sink - Place where flume will send the files. HDFS location in your case.
Flume channel - Temporary location of your file before it is sent to sink. You need to use "File Channel" for your case.
Click here for an example.
Upvotes: 1
Reputation: 681
As per my comment, more details would help narrow down possibilities, example first thought, move file to server and just create a bash script and schedule with cron.
put
Usage: hdfs dfs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.
hdfs dfs -put localfile /user/hadoop/hadoopfile
hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir
hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:
Returns 0 on success and -1 on error.
Upvotes: 0