hadoop suggestions on how to process logs

Question

I need some suggestions on how I should process infrastructure logs using hadoop in Java instead of Pig as I think Pig does not support regex filters while reading log files.

As an example, i have cisco logs and web server logs and I want to filter specific values by line and feed into hadoop.

There are couple of suggestions online i.e to first change it to csv format, but what if the log file is in GBs???

Is it possible to filter the lines at "map" stage i.e the program will read lines from the file in HDFS and send it to mapper...

I need some suggestions on best way and clean way to do this....

thanks.

hadoop suggestions on how to process logs

Answers (1)

Related Questions