Reputation: 1547
It looks like map and flatMap return different types.
mySchamaRdd.map( p => Row.fromSeq(...))
returns org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] which is required for applySchema function (or createDataFrame in spark 1.3).
However, mySchamaRdd.flatMap( p => Row.fromSeq(...)
returns org.apache.spark.rdd.RDD[Any] and I cannot call applySchema().
How can I use applySchema() after flatMap()?
An example (input schema: Name, Description)
Bob, "Software developer"
John, "I like spaghetti"
Result:
Bob, Software
Bob, Developer
John, I
John, like
John, spaghetti
Upvotes: 0
Views: 2734
Reputation: 6242
Maybe I misunderstood the way you are creating your SchemaRDD
, or maybe you misunderstood the way flatMap
is supposed to work. Did you try this?
mySchemaRDD.flatMap( p => p.getString(1).split(" +").map( x => Row((p.getString(0), x))))
I think that mySchamaRdd.flatMap( p => Row.fromSeq(...))
is not the appropriate use of flatMap
, since you are supposed to return a sequence of things embedded in something, in order to flat it and extract what you really want to return. In your case, you are embedding it in a Row
, while the result type you want to return is Row
itself.
Upvotes: 0