load hdfs file into spark context

Question

I am new to spark/scala and need to load a file from hdfs to spark. I have a file in hdfs (/newhdfs/abc.txt), and I could see my file contents by using hdfs dfs -cat /newhdfs/abc.txt

I did in below order to load the file into spark context

spark-shell #It entered into scala console window

scala> import org.apache.spark._; //Line 1
scala> val conf=new SparkConf().setMaster("local[*]");
scala> val sc = new SparkContext(conf);
scala> val input=sc.textFile("hdfs:///newhdfs/abc.txt"); //Line 4

Once I hit enter on line 4, I am getting below message.

input: org.apache.spark.rdd.RDD[String] = hdfs:///newhdfs/abc.txt MapPartitionsRDD[19] at textFile at :27``

Is this a fatal error? What do I need to do to solve this?

(Using Spark-2.0.0 and Hadoop 2.7.0)

gsamaras · Accepted Answer

This is not an error, it just says the name of the file for your RDD.

In the Basic docs, there is this example:

scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at :25

which demonstrates the very same behavior.

How would you expect an error to happen without an action triggering actual work to happen?

If you want to check that everything is OK, do a count of your input RDD, which is an action and will trigger the actual read of the file, and then the count of the elements of your RDD.

load hdfs file into spark context

Answers (1)

Related Questions