Reputation:
If in this case I want to show
the header
. Why I cannot write in the third line header.show()
?
What I have to do to view the content of the header variable?
val hospitalDataText = sc.textFile("/Users/bhaskar/Desktop/services.csv")
val header = hospitalDataText.first() //Remove the header
Upvotes: 0
Views: 9153
Reputation: 41987
If you use sparkContext (sc.textFile), you get an RDD. You are getting the error because header
is not a dataframe
but a rdd
. And show
is applicable on dataframe
or dataset
only.
You will have to read the textfile with sqlContext
and not sparkContext
.
What you can do is use sqlContext
and show(1)
as
val hospitalDataText = sqlContext.read.csv("/Users/bhaskar/Desktop/services.csv")
hospitalDataText.show(1, false)
Updated for more clarification
sparkContext
would create rdd
which can be seen in
scala> val hospitalDataText = sc.textFile("file:/test/resources/t1.csv")
hospitalDataText: org.apache.spark.rdd.RDD[String] = file:/test/resources/t1.csv MapPartitionsRDD[5] at textFile at <console>:25
And if you use .first()
then the first string of the RDD[String]
is extracted as
scala> val header = hospitalDataText.first()
header: String = test1,26,BigData,test1
Now answering your comment below, yes you can create dataframe
from header
string just created
Following will put the string in one column
scala> val sqlContext = spark.sqlContext
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@3fc736c4
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> Seq(header).toDF.show(false)
+----------------------+
|value |
+----------------------+
|test1,26,BigData,test1|
+----------------------+
If you want each string in separate columns you can do
scala> val array = header.split(",")
array: Array[String] = Array(test1, 26, BigData, test1)
scala> Seq((array(0), array(1), array(2), array(3))).toDF().show(false)
+-----+---+-------+-----+
|_1 |_2 |_3 |_4 |
+-----+---+-------+-----+
|test1|26 |BigData|test1|
+-----+---+-------+-----+
You can even define the header names as
scala> Seq((array(0), array(1), array(2), array(3))).toDF("col1", "number", "text2", "col4").show(false)
+-----+------+-------+-----+
|col1 |number|text2 |col4 |
+-----+------+-------+-----+
|test1|26 |BigData|test1|
+-----+------+-------+-----+
More advanced approach would be to use sqlContext.createDataFrame
with Schema
defined
Upvotes: 0
Reputation: 35249
If you want a DataFrame
use DataFrameReader
and limit
:
spark.read.text(path).limit(1).show
otherwise just println
println(header)
Unless of course you want to use cats
Show
. With cats add package to spark.jars.packages
and
import cats.syntax.show._
import cats.instances.string._
sc.textFile(path).first.show
Upvotes: 1