Case Class from Parquet using Spark

Question

I have worked through some example code, on how to store data in a parquet file and implemented it pretty much as shown in the programming guide:

val schema = StructType(
  List(StructField("id", LongType, false), StructField("values", ArrayType(FloatType), false))
)
val dataframe = sqlContext.createDataFrame(rowRDD, schema).saveAsParquetFile("file.parquet")

When reading the parquet file, I use

sqlContext.parquetFile("file.parquet")

The examples in the programming guide always assume that you work with strings, and therefore the following works pretty straight forward:

data.map(t => "Name: " + t(0)).collect().foreach(println)

However, as you can see in my schema definition, I work with a float array. Of course, I could parse the string to a float array myself, but it doesn't seem to be way of doing it. What is the best way of doing this?

Justin Pihony · Accepted Answer

Row returns an Any when used with the base indexer, so you should be able to just use t.getSeq[Float](0) and it will return your data as a Seq[Float]. You can also use printSchema on your DataFrame to verify that the type is indeed an ArrayType

Case Class from Parquet using Spark

Answers (1)

Related Questions