Reputation: 314
I'm trying to convert an RDD of my custom objects (a Java class) to a Dataframe, I simply used the method hiveContext.createDataframe specifying the class of the object. The problem is that the dataframe is created with columns in some strange order, and once I write the DF to Hive the values are in the wrong columns. Here is my code:
var objectRDD = tableDF.map((r: Row) => new Attuatore(r(0),r(1)...))
[.. operations with the RDD ..]
val resultDF = hiveContext.createDataFrame(objectRDD, classOf[Attuatore])
resultDF.write.mode("append").saveAsTable(outputTable)
The only solution I found so far for having the fields in the right order is to convert back the RDD[Attuatore] to an RDD[Row], and then call createDataFrame() specifying the schema, but since I have to do this with a lot of classes I would prefer the first approach to have a much cleaner code.
Upvotes: 1
Views: 1522
Reputation: 418
As the documentation for HiveContext.createDataFrame says
Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
So if you need to put fields in a defined order, you have to do it explicitly, e.g.
val MY_COLUMNS = Seq("field1", "field2", ...)
val conformedDF = resultDF.select(MY_COLUMNS.map(col(_)):_*)
conformedDF.write...
Upvotes: 1