Reputation:
I need a system to analyze large log files. A friend directed me to hadoop the other day and it seems perfect for my needs. My question revolves around getting data into hadoop-
Is it possible to have the nodes on my cluster stream data as they get it into HDFS? Or would each node need to write to a local temp file and submit the temp file after it reaches a certain size? and is it possible to append to a file in HDFS while also running queries/jobs on that same file at the same time?
Upvotes: 2
Views: 4241
Reputation: 1441
Fluentd log collector just released its WebHDFS plugin, which allows the users to instantly stream data into HDFS. It's really easy to install with ease of management.
Of course you can import data directly from your applications. Here's a Java example to post logs against Fluentd.
Upvotes: 2
Reputation: 4236
I'd recommend using Flume to collect the log files from your servers into HDFS.
Upvotes: 0
Reputation: 4107
A hadoop job can run over multiple input files, so there's really no need to keep all your data as one file. You won't be able to process a file until its file handle is properly closed, however.
Upvotes: 1
Reputation: 8986
HDFS does not support appends (yet?)
What I do is run the map-reduce job periodically and output results to an 'processed_logs_#{timestamp}" folder. Another job can later take these processed logs and push them to a database etc. so it can be queried on-line
Upvotes: 0