User456898
User456898

Reputation: 5724

Flume - Stream log file from Windows to HDFS in Linux

How to stream a log file from Windows 7 to HDFS in Linux ?

Flume in Windows is giving error

I have installed 'flume-node-0.9.3' on Windows 7 (Node 1) . The 'flumenode' service is running and localhost:35862 is accessible.
In Windows, the log file is located at 'C:/logs/Weblogic.log'
The Flume agent in CentOS Linux (Node 2) is also running.

  1. In Windows machine, JAVA_HOME variable is set to "C:\Program Files\Java\jre7"
  2. The Java.exe file is located at "C:\Program Files\Java\jre7\bin\java.exe"
  3. Flume node is installed at " C:\Program Files\Cloudera\Flume 0.9.3"

Here is the flume-src.conf file placed inside 'conf' folder of Flume on Windows 7 (Node 1)

source_agent.sources = weblogic_server
source_agent.sources.weblogic_server.type = exec
source_agent.sources.weblogic_server.command = tail -f C:/logs/Weblogic.log
source_agent.sources.weblogic_server.batchSize = 1
source_agent.sources.weblogic_server.channels = memoryChannel
source_agent.sources.weblogic_server.interceptors = itime ihost itype

source_agent.sources.weblogic_server.interceptors.itime.type = timestamp

source_agent.sources.weblogic_server.interceptors.ihost.type = host
source_agent.sources.weblogic_server.interceptors.ihost.useIP = false
source_agent.sources.weblogic_server.interceptors.ihost.hostHeader = host

source_agent.sources.weblogic_server.interceptors.itype.type = static
source_agent.sources.weblogic_server.interceptors.itype.key = log_type
source_agent.sources.weblogic_server.interceptors.itype.value = apache_access_combined

source_agent.channels = memoryChannel
source_agent.channels.memoryChannel.type = memory
source_agent.channels.memoryChannel.capacity = 100

source_agent.sinks = avro_sink
source_agent.sinks.avro_sink.type = avro
source_agent.sinks.avro_sink.channel = memoryChannel
source_agent.sinks.avro_sink.hostname = 10.10.201.40

source_agent.sinks.avro_sink.port = 41414

I tried to run the above mentioned file by executing the following command inside the Flume folder:

C:\Program Files\Cloudera\Flume 0.9.3>"C:\Program Files\Java\jre7\bin\java.exe" 
-Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -cp "C:\Program Files\Cloudera\Flume 0.9.3\lib*" org.apache.flume.node.Application 
-f C:\Program Files\Cloudera\Flume 0.9.3\conf\flume-src.conf -n source_agent

But it gives the following message:

Error: Could not find or load main class Files\Cloudera\Flume

Here is the trg-node.conf file running in CentOS (Node 2). The CentOS node is working fine:

collector.sources = AvroIn
collector.sources.AvroIn.type = avro
collector.sources.AvroIn.bind = 0.0.0.0
collector.sources.AvroIn.port = 41414
collector.sources.AvroIn.channels = mc1 mc2

collector.channels = mc1 mc2
collector.channels.mc1.type = memory
collector.channels.mc1.capacity = 100
collector.channels.mc2.type = memory
collector.channels.mc2.capacity = 100

collector.sinks = HadoopOut
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path =/user/root
collector.sinks.HadoopOut.hdfs.callTimeout = 150000
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 10000
collector.sinks.HadoopOut.hdfs.rollInterval = 600

Upvotes: 0

Views: 1873

Answers (1)

Erik Schmiegelow
Erik Schmiegelow

Reputation: 2759

The problem is due to the white space between Program and Files in this path:

C:**Program Files**\Cloudera\Flume 0.9.3

Consider installing Flume in a path without whitespaces, it will work like a charm.

Upvotes: 2

Related Questions