Naveen
Naveen

Reputation: 115

How to access files from hdfs in spark-shell of cloudera single node cluster

While trying to create RDD from spark-shell on Cloudera cluster facing issues, while access files from hdfs location:

scala> val file = sc.textFile("hdfs://user/cloudera/nvegesn/emp.txt")
<console>:13: error: not found: value sc

Upvotes: 0

Views: 1189

Answers (1)

ssuperczynski
ssuperczynski

Reputation: 3416

You forget to define Spark Context.

val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)

For example:

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "hdfs://user/cloudera/nvegesn/emp.txt"
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Upvotes: 1

Related Questions