Girish Bhat M
Girish Bhat M

Reputation: 392

Hdfs file access in spark

I am developing an application , where I read a file from hadoop, process and store the data back to hadoop. I am confused what should be the proper hdfs file path format. When reading a hdfs file from spark shell like

val file=sc.textFile("hdfs:///datastore/events.txt")

it works fine and I am able to read it.

But when I sumbit the jar to yarn which contains same set of code it is giving the error saying

org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/datastore/events.txt

When I add name node ip as hdfs://namenodeserver/datastore/events.txt everything works.

I am bit confused about the behaviour and need an guidance.

Note: I am using aws emr set up and all the configurations are default.

Upvotes: 2

Views: 6549

Answers (2)

Girish Bhat M
Girish Bhat M

Reputation: 392

Problem solved. As I debugged further fs.defaultFS property was not used from core-site.xml when I just pass path as hdfs:///path/to/file. But all the hadoop config properties are loaded (as I logged the sparkContext.hadoopConfiguration object.

As a work around I manually read the property as sparkContext.hadoopConfiguration().get("fs.defaultFS) and appended this in the path.

I don't know is it a correct way of doing it.

Upvotes: 1

kishan singh
kishan singh

Reputation: 56

if you want to use sc.textFile("hdfs://...") you need to give the full path(absolute path), in your example that would be "nn1home:8020/.."

If you want to make it simple, then just use sc.textFile("hdfs:/input/war-and-peace.txt")

That's only one /

I think it will work.

Upvotes: 1

Related Questions