vamper1234
vamper1234

Reputation: 114

Dataproc HDFS file URIs

I have a question how to get path/url to the file located in dataproc hdfs? I want to run a M/R job based on a file that located in dataproc hdfs.

Upvotes: 2

Views: 1341

Answers (1)

Dagang Wei
Dagang Wei

Reputation: 26478

The followings are all valid HDFS URIs in a Dataproc cluster:

  1. hdfs://<master-hostname>:8020/<path-to-file>
  2. hdfs://<master-hostname>/<path-to-file>
  3. hdfs:///<path-to-file>

The 3rd one works, because by default in every node of a Dataproc cluster, the fs.defaultFS property is configured as hdfs://<master-hostname> in /etc/hadoop/conf/core-site.xml. And 8020 is the default NameNode port.

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://<master-hostname></value>
    <description>
      The name of the default file system. A URI whose scheme and authority
      determine the FileSystem implementation. The uri's scheme determines
      the config property (fs.SCHEME.impl) naming the FileSystem
      implementation class. The uri's authority is used to determine the
      host, port, etc. for a filesystem.
    </description>
  </property>

You can run hadoop fs -ls <uri> on any node to list the files.

Upvotes: 3

Related Questions