Reputation: 392
I am developing an application , where I read a file from hadoop, process and store the data back to hadoop. I am confused what should be the proper hdfs file path format. When reading a hdfs file from spark shell like
val file=sc.textFile("hdfs:///datastore/events.txt")
it works fine and I am able to read it.
But when I sumbit the jar to yarn which contains same set of code it is giving the error saying
org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/datastore/events.txt
When I add name node ip as hdfs://namenodeserver/datastore/events.txt
everything works.
I am bit confused about the behaviour and need an guidance.
Note: I am using aws emr set up and all the configurations are default.
Upvotes: 2
Views: 6549
Reputation: 392
Problem solved. As I debugged further fs.defaultFS
property was not used from core-site.xml
when I just pass path as hdfs:///path/to/file
. But all the hadoop config properties are loaded (as I logged the sparkContext.hadoopConfiguration
object.
As a work around I manually read the property as sparkContext.hadoopConfiguration().get("fs.defaultFS)
and appended this in the path.
I don't know is it a correct way of doing it.
Upvotes: 1
Reputation: 56
if you want to use sc.textFile("hdfs://...") you need to give the full path(absolute path), in your example that would be "nn1home:8020/.."
If you want to make it simple, then just use sc.textFile("hdfs:/input/war-and-peace.txt")
That's only one /
I think it will work.
Upvotes: 1