Reputation: 849
I have some data in HDFS /user/Cloudera/Test/*
. I am very well able to see the records by running hdfs -dfs -cat Test/*
.
Now the same file, I need it to be read as RDD in scala. I have tried the following in scala shell.
val file = sc.textFile("hdfs://quickstart.cloudera:8020/user/Cloudera/Test")
Then I have written some filter and for loop to read the words. But when I use the Println
at last, it says file not found.
Can anyone please help me know what would be the HDFS url in this case. Note: I am using Cloudera CDH5.0 VM
Upvotes: 5
Views: 19823
Reputation: 1006
If you are trying to access your file in spark job then you can simply use URL
val file = sc.textFile("/user/Cloudera/Test")
Spark will automatically detect this file. You do not need to add localhost as prefix because spark job by default read them from HDFS directory.
Hope this solve your query.
Upvotes: 3
Reputation: 16
Instead of using "quickstart.cloudera" and the port, use just the ip address:
val file = sc.textFile("hdfs://<ip>/user/Cloudera/Test")
Upvotes: 0