Ken Williams
Ken Williams

Reputation: 23995

Apache Spark: get elements of Row by name

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract values by name? I can see how to do some really awkward stuff:

def foo(r: Row) = {
  val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
  val field1 = r.getString(ix("field1"))
  val field2 = r.getLong(ix("field2"))
  ...
}
dataframe.map(foo)

I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error.

Upvotes: 17

Views: 28352

Answers (2)

Kexin Nie
Kexin Nie

Reputation: 501

You can use "getAs" from org.apache.spark.sql.Row

r.getAs("field1")
r.getAs("field2")

Know more about getAs(java.lang.String fieldName)

Upvotes: 39

Justin Pihony
Justin Pihony

Reputation: 67115

This is not supported at this time in the Scala API. The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs"

Upvotes: 3

Related Questions