Reputation: 2517
I have worked through some example code, on how to store data in a parquet file and implemented it pretty much as shown in the programming guide:
val schema = StructType(
List(StructField("id", LongType, false), StructField("values", ArrayType(FloatType), false))
)
val dataframe = sqlContext.createDataFrame(rowRDD, schema).saveAsParquetFile("file.parquet")
When reading the parquet file, I use
sqlContext.parquetFile("file.parquet")
The examples in the programming guide always assume that you work with strings, and therefore the following works pretty straight forward:
data.map(t => "Name: " + t(0)).collect().foreach(println)
However, as you can see in my schema definition, I work with a float array. Of course, I could parse the string to a float array myself, but it doesn't seem to be way of doing it. What is the best way of doing this?
Upvotes: 1
Views: 1045
Reputation: 67065
Row
returns an Any
when used with the base indexer, so you should be able to just use t.getSeq[Float](0)
and it will return your data as a Seq[Float]
. You can also use printSchema
on your DataFrame
to verify that the type is indeed an ArrayType
Upvotes: 1