Srikant Sahu
Srikant Sahu

Reputation: 849

Url for HDFS file system

I have some data in HDFS /user/Cloudera/Test/*. I am very well able to see the records by running hdfs -dfs -cat Test/*.

Now the same file, I need it to be read as RDD in scala. I have tried the following in scala shell.

val file = sc.textFile("hdfs://quickstart.cloudera:8020/user/Cloudera/Test")

Then I have written some filter and for loop to read the words. But when I use the Println at last, it says file not found.

Can anyone please help me know what would be the HDFS url in this case. Note: I am using Cloudera CDH5.0 VM

Upvotes: 5

Views: 19823

Answers (2)

siddhartha jain
siddhartha jain

Reputation: 1006

If you are trying to access your file in spark job then you can simply use URL

val file = sc.textFile("/user/Cloudera/Test") 

Spark will automatically detect this file. You do not need to add localhost as prefix because spark job by default read them from HDFS directory.

Hope this solve your query.

Upvotes: 3

user7432598
user7432598

Reputation: 16

Instead of using "quickstart.cloudera" and the port, use just the ip address:

val file = sc.textFile("hdfs://<ip>/user/Cloudera/Test")

Upvotes: 0

Related Questions