Libin
Libin

Reputation: 165

How to write data to HA Hadoop QJM using Apache FLUME?

How flume will identify active namenode so that data will be written to HDFS? Without High Availability Hadoop we will have namenode ip configured in flume.conf so that the data will be easily directed to HDFS. Whereas in our case, Flume should identify active and standby namenodes and thereby data should be directed to active one.

Upvotes: 2

Views: 909

Answers (3)

Hayati İbiş
Hayati İbiş

Reputation: 127

With respect to Pilgrim's answer you can place only hdfs-site.xml config file to your flume classpath. Simply copy this file to $APACHE_FLUME_HOME/conf/ directory or add FLUME_CLASSPATH="/where/is/your/hdfs-site.xml" to flume-env.sh

You have to be sure about hadoop nameservice configuration is suitable for this.

Upvotes: 1

Andrey
Andrey

Reputation: 800

This works for me(hadoop 2.7.1, flume 1.6.0): Place hadoop *-site.xml config files to you flume classpath

Not sure which of them works, I placed core-site, hdfs-site, yarn-site, mapred-site) but settings for cluster name are in core-site.xml

Upvotes: 0

frb
frb

Reputation: 3798

AFAIK it is not possible in a direct way. The HDFS sink configuration has only room for one Namenode.

Nevertheless, I think you can configure two HDFS sinks (and two channels), each one pointing to a Namenode. The source will put a copy of each event in both channels due to the default Replicating Channel Selector. So, each sink will try to persist the data by itself; the one pointing to the standby Namenode will not persist anything until the active one falls down and the standby becomes active.

HTH!

Upvotes: 2

Related Questions