Reputation: 60004
I have Array[org.apache.spark.sql.Row]
returned by sqc.sql(sqlcmd).collect()
:
Array([10479,6,10], [8975,149,640], ...)
I can get the individual values:
scala> pixels(0)(0)
res34: Any = 10479
but they are Any
, not Int
.
How do I extract them as Int
?
The most obvious solution did not work:
scala> pixels(0).getInt(0)
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Int
PS. I can do pixels(0)(0).toString.toInt
or pixels(0).getString(0).toInt
, but they feel wrong...
Upvotes: 16
Views: 40150
Reputation: 61
the answer is relevant. you dont need to use collect instead you need to call the methods getInt
getString
and getAs
as well in case the datatype is complex
val popularHashTags = sqlContext.sql("SELECT hashtags, usersMentioned, Url FROM tweets")
var hashTagsList = popularHashTags.flatMap ( x => x.getAs[Seq[String]](0))
Upvotes: 0
Reputation: 67065
Using getInt
should work. Here is a contrived example as a proof of concept
import org.apache.spark.sql._
sc.parallelize(Array(1,2,3)).map(Row(_)).collect()(0).getInt(0)
This return 1
However,
sc.parallelize(Array("1","2","3")).map(Row(_)).collect()(0).getInt(0)
fails. So, it looks like it is coming in as a string and you will have to convert to an int manually.
sc.parallelize(Array("1","2","3")).map(Row(_)).collect()(0).getString(0).toInt
The documentation states that getInt
:
Returns the value of column i as an int. This function will throw an exception if the value is at i is not an integer, or if it is null.
So, it will not try to cast for you it seems
Upvotes: 14
Reputation: 1838
The Row
class (also see https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.package) has methods getInt(i: Int)
, getDouble(i: Int)
etc.
Also note that a SchemaRDD
is an RDD[Row]
plus a schema
that tells you which column has which data type. If you do .collect()
you will only get an Array[Row]
which does not have that information. So unless you know for sure what your data looks like, get the schema from the SchemaRDD
, then collect the rows and then access each field using the correct type information.
Upvotes: 2