user8331236
user8331236

Reputation:

error: value show is not a member of String

If in this case I want to show the header . Why I cannot write in the third line header.show()? What I have to do to view the content of the header variable?

val hospitalDataText = sc.textFile("/Users/bhaskar/Desktop/services.csv")
val header = hospitalDataText.first() //Remove the header

Upvotes: 0

Views: 9153

Answers (2)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

If you use sparkContext (sc.textFile), you get an RDD. You are getting the error because header is not a dataframe but a rdd. And show is applicable on dataframe or dataset only.

You will have to read the textfile with sqlContext and not sparkContext.

What you can do is use sqlContext and show(1) as

val hospitalDataText = sqlContext.read.csv("/Users/bhaskar/Desktop/services.csv")
hospitalDataText.show(1, false)

Updated for more clarification

sparkContext would create rdd which can be seen in

scala> val hospitalDataText = sc.textFile("file:/test/resources/t1.csv")
hospitalDataText: org.apache.spark.rdd.RDD[String] = file:/test/resources/t1.csv MapPartitionsRDD[5] at textFile at <console>:25

And if you use .first() then the first string of the RDD[String] is extracted as

scala> val header = hospitalDataText.first()
header: String = test1,26,BigData,test1

Now answering your comment below, yes you can create dataframe from header string just created

Following will put the string in one column

scala> val sqlContext = spark.sqlContext
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@3fc736c4

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> Seq(header).toDF.show(false)
+----------------------+
|value                 |
+----------------------+
|test1,26,BigData,test1|
+----------------------+

If you want each string in separate columns you can do

scala> val array = header.split(",")
array: Array[String] = Array(test1, 26, BigData, test1)

scala> Seq((array(0), array(1), array(2), array(3))).toDF().show(false)
+-----+---+-------+-----+
|_1   |_2 |_3     |_4   |
+-----+---+-------+-----+
|test1|26 |BigData|test1|
+-----+---+-------+-----+

You can even define the header names as

scala> Seq((array(0), array(1), array(2), array(3))).toDF("col1", "number", "text2", "col4").show(false)
+-----+------+-------+-----+
|col1 |number|text2  |col4 |
+-----+------+-------+-----+
|test1|26    |BigData|test1|
+-----+------+-------+-----+

More advanced approach would be to use sqlContext.createDataFrame with Schema defined

Upvotes: 0

Alper t. Turker
Alper t. Turker

Reputation: 35249

If you want a DataFrame use DataFrameReader and limit:

spark.read.text(path).limit(1).show

otherwise just println

println(header)

Unless of course you want to use cats Show. With cats add package to spark.jars.packages and

import cats.syntax.show._
import cats.instances.string._

sc.textFile(path).first.show

Upvotes: 1

Related Questions