helm
helm

Reputation: 713

org.apache.spark.sql.Row to Int

I'm trying to get an Integer from a SQL statement in spark-sql.

var num_en = ctx.sql("SELECT count(*) FROM table WHERE lang = 'en'")
num = num_en.collect()(0)

num_en is a SchemaRDD, and num, according to the error I get, a "Row".

<console>:144: error: type mismatch;
 found   : org.apache.spark.sql.Row
    (which expands to)  org.apache.spark.sql.catalyst.expressions.Row

The problem is that I can't find any useful documentation for either org.apache.spark.sql.Row or org.apache.spark.sql.catalyst.expressions.Row.

How can I extract this one integer value that the SQL statement returns for later use?

Upvotes: 4

Views: 12948

Answers (2)

edC0der
edC0der

Reputation: 1

The reason for thist is that num_en is a SchemaRDD. When you do collect() on it, you get and Array[org.apache.spark.sql.Row] so num_en.collect()(0) gives you the first Row of the Array.

Upvotes: 0

maasg
maasg

Reputation: 37435

The best doc is the source

Row.scala

  /**
   * Returns the value of column `i` as an int.  This function will throw an exception if the value
   * is at `i` is not an integer, or if it is null.
   */
  def getInt(i: Int): Int =
    row.getInt(i)

Applied to your example:

num = num_en.collect()(0).getInt(0)

Upvotes: 8

Related Questions