Reputation: 713
I'm trying to get an Integer from a SQL statement in spark-sql.
var num_en = ctx.sql("SELECT count(*) FROM table WHERE lang = 'en'")
num = num_en.collect()(0)
num_en is a SchemaRDD, and num, according to the error I get, a "Row".
<console>:144: error: type mismatch;
found : org.apache.spark.sql.Row
(which expands to) org.apache.spark.sql.catalyst.expressions.Row
The problem is that I can't find any useful documentation for either org.apache.spark.sql.Row or org.apache.spark.sql.catalyst.expressions.Row.
How can I extract this one integer value that the SQL statement returns for later use?
Upvotes: 4
Views: 12948
Reputation: 1
The reason for thist is that num_en
is a SchemaRDD
. When you do collect()
on it, you get and Array[org.apache.spark.sql.Row]
so num_en.collect()(0)
gives you the first Row of the Array.
Upvotes: 0
Reputation: 37435
The best doc is the source
/**
* Returns the value of column `i` as an int. This function will throw an exception if the value
* is at `i` is not an integer, or if it is null.
*/
def getInt(i: Int): Int =
row.getInt(i)
Applied to your example:
num = num_en.collect()(0).getInt(0)
Upvotes: 8