Agung Pratama
Agung Pratama

Reputation: 3784

What is the usage of HDFS configuration in Java?

I am little confused about HDFS Java API, especially the role of hadoop Configuration against the config we put on hadoop server installation (/etc/hadoop/core-site.xml, etc).

  1. Should I install the hadoop in every java client program that utilizes the hdfs?
  2. Does any configuration set from java client affect how it communicates with hadoop server (hdfs server)?

Upvotes: 1

Views: 1218

Answers (2)

Folam. H
Folam. H

Reputation: 1

  1. No. In every java client you should only import the hdfs Java package.
  2. Example:

    public class HdfsTest {
        //download file from hdfs
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", "hdfs://yourHadoopIP:9000/");
            conf.set("dfs.blocksize", "64");
    
            //to get a client of the hdfs system
            FileSystem fs = FileSystem.get(conf);
            fs.copyToLocalFile(new Path("hdfs://yourHadoopIP:9000/jdk-7u65-linux-i586.tar.gz"), new Path("/root/jdk.tgz"));
            fs.close(); 
        }
    }
    

Upvotes: 0

Vignesh I
Vignesh I

Reputation: 2221

You can set values for your parameters either in core-site.xml or through configuration in your driver code. The one set in the program will overwrite the one set in the xml file. So for example if you have to set a compression code. Then either you could add these to core-site.xml

<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>

or add this line to your driver code.

 Configuration conf = new Configuration();
 conf.set("mapred.compress.map.output", "true");
 conf.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec"); 

And you dont need to install hadoop on every machine/node. Just install it in your master node and add datanodes by adding IP to the list. This would help you in understanding how a multi node cluster has to be set up.

Upvotes: 1

Related Questions