Reputation: 191
I am executing twitter sample code, while i am getting error for value head is not a member of org.apache.spark.sql.Row, can someone please explain little bit more on this error.
val tweets = sc.textFile(tweetInput)
println("------------Sample JSON Tweets-------")
for (tweet <- tweets.take(5)) {
println(gson.toJson(jsonParser.parse(tweet)))
}
val tweetTable = sqlContext.jsonFile(tweetInput).cache()
tweetTable.registerTempTable("tweetTable")
println("------Tweet table Schema---")
tweetTable.printSchema()
println("----Sample Tweet Text-----")
sqlContext.sql("SELECT text FROM tweetTable LIMIT 10").collect().foreach(println)
println("------Sample Lang, Name, text---")
sqlContext.sql("SELECT user.lang, user.name, text FROM tweetTable LIMIT 1000").collect().foreach(println)
println("------Total count by languages Lang, count(*)---")
sqlContext.sql("SELECT user.lang, COUNT(*) as cnt FROM tweetTable GROUP BY user.lang ORDER BY cnt DESC LIMIT 25").collect.foreach(println)
println("--- Training the model and persist it")
val texts = sqlContext.sql("SELECT text from tweetTable").map(_.head.toString)
// Cache the vectors RDD since it will be used for all the KMeans iterations.
val vectors = texts.map(Utils.featurize).cache()
Upvotes: 0
Views: 3457
Reputation: 4055
I think your problem is that the sql method returns a DataSet of Row
s. Therefore the _ represents a Row
and Row
doesn't have a head
method (which explains the error message).
To access items in a Row you can do one of the following:
// get the first element in the Row
val texts = sqlContext.sql("...").map(_.get(0))
// get the first element as an Int
val texts = sqlContext.sql("...").map(_.getInt(0))
See here for more info: https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Row.html
Upvotes: 1