error: value show is not a member of String

Question

If in this case I want to show the header . Why I cannot write in the third line header.show()? What I have to do to view the content of the header variable?

val hospitalDataText = sc.textFile("/Users/bhaskar/Desktop/services.csv")
val header = hospitalDataText.first() //Remove the header

Ramesh Maharjan · Accepted Answer

If you use sparkContext (sc.textFile), you get an RDD. You are getting the error because header is not a dataframe but a rdd. And show is applicable on dataframe or dataset only.

You will have to read the textfile with sqlContext and not sparkContext.

What you can do is use sqlContext and show(1) as

val hospitalDataText = sqlContext.read.csv("/Users/bhaskar/Desktop/services.csv")
hospitalDataText.show(1, false)

Updated for more clarification

sparkContext would create rdd which can be seen in

scala> val hospitalDataText = sc.textFile("file:/test/resources/t1.csv")
hospitalDataText: org.apache.spark.rdd.RDD[String] = file:/test/resources/t1.csv MapPartitionsRDD[5] at textFile at :25

And if you use .first() then the first string of the RDD[String] is extracted as

scala> val header = hospitalDataText.first()
header: String = test1,26,BigData,test1

Now answering your comment below, yes you can create dataframe from header string just created

Following will put the string in one column

scala> val sqlContext = spark.sqlContext
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@3fc736c4

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> Seq(header).toDF.show(false)
+----------------------+
|value                 |
+----------------------+
|test1,26,BigData,test1|
+----------------------+

If you want each string in separate columns you can do

scala> val array = header.split(",")
array: Array[String] = Array(test1, 26, BigData, test1)

scala> Seq((array(0), array(1), array(2), array(3))).toDF().show(false)
+-----+---+-------+-----+
|_1   |_2 |_3     |_4   |
+-----+---+-------+-----+
|test1|26 |BigData|test1|
+-----+---+-------+-----+

You can even define the header names as

scala> Seq((array(0), array(1), array(2), array(3))).toDF("col1", "number", "text2", "col4").show(false)
+-----+------+-------+-----+
|col1 |number|text2  |col4 |
+-----+------+-------+-----+
|test1|26    |BigData|test1|
+-----+------+-------+-----+

More advanced approach would be to use sqlContext.createDataFrame with Schema defined

error: value show is not a member of String

Answers (2)

Related Questions