Reputation: 874
I am new to spark/scala and need to load a file from hdfs to spark. I have a file in hdfs (/newhdfs/abc.txt
), and I could see my file contents by using hdfs dfs -cat /newhdfs/abc.txt
I did in below order to load the file into spark context
spark-shell #It entered into scala console window
scala> import org.apache.spark._; //Line 1
scala> val conf=new SparkConf().setMaster("local[*]");
scala> val sc = new SparkContext(conf);
scala> val input=sc.textFile("hdfs:///newhdfs/abc.txt"); //Line 4
Once I hit enter on line 4, I am getting below message.
input: org.apache.spark.rdd.RDD[String] = hdfs:///newhdfs/abc.txt MapPartitionsRDD[19] at textFile at <console>:27``
Is this a fatal error? What do I need to do to solve this?
(Using Spark-2.0.0 and Hadoop 2.7.0)
Upvotes: 2
Views: 3569
Reputation: 73444
This is not an error, it just says the name of the file for your RDD.
In the Basic docs, there is this example:
scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:25
which demonstrates the very same behavior.
How would you expect an error to happen without an action triggering actual work to happen?
If you want to check that everything is OK, do a count of your input
RDD, which is an action and will trigger the actual read of the file, and then the count of the elements of your RDD.
Upvotes: 3