Reputation: 362

Dumping csv logs files from windows server to ubuntu VirtualBox/hadoop/hdfs

We are getting new files everyday from apps in the form of csv gets stored in windows server say c:/program files(x86)/webapps/apachetomcat/.csv each file having different data in it, So is there any hadoop component to transfer files from windows server to hadoop hdfs, I came across flume,kafka but not getting proper example, Can anyone shade light here.

So Each file have separate name and having size upto 10-20mb and the daily file count is more than 200 files, Once the files added to windows server the flume/kafka should able to put that files in hadoop, Later files are imported from HDFS processed by spark and moved to processed files to another folder in HDFS

Upvotes: 0

Answers (2)

kashmoney

Reputation: 97

Flume is the best choice. A flume agent (process) needs to be configured. A flume agent has 3 parts:

Flume source - Place where flume will look for new files. c:/program files(x86)/webapps/apachetomcat/.csv in your case.

Flume sink - Place where flume will send the files. HDFS location in your case.

Flume channel - Temporary location of your file before it is sent to sink. You need to use "File Channel" for your case.

Click here for an example.

Upvotes: 1

AM_Hawk

Reputation: 681

As per my comment, more details would help narrow down possibilities, example first thought, move file to server and just create a bash script and schedule with cron.

put

Usage: hdfs dfs -put <localsrc> ... <dst>

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

hdfs dfs -put localfile /user/hadoop/hadoopfile
hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir
hdfs dfs  -put localfile hdfs://nn.example.com/hadoop/hadoopfile
hdfs dfs  -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:

Returns 0 on success and -1 on error.

Upvotes: 0

Dumping csv logs files from windows server to ubuntu VirtualBox/hadoop/hdfs

Answers (2)

Related Questions