Dmitry Petrov
Dmitry Petrov

Reputation: 1547

Spark map and flatMap result types

It looks like map and flatMap return different types.

mySchamaRdd.map( p => Row.fromSeq(...)) returns org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] which is required for applySchema function (or createDataFrame in spark 1.3).

However, mySchamaRdd.flatMap( p => Row.fromSeq(...) returns org.apache.spark.rdd.RDD[Any] and I cannot call applySchema().

How can I use applySchema() after flatMap()?

An example (input schema: Name, Description)

Bob, "Software developer"
John, "I like spaghetti"

Result:

Bob, Software
Bob, Developer
John, I
John, like
John, spaghetti

Upvotes: 0

Views: 2734

Answers (1)

ale64bit
ale64bit

Reputation: 6242

Maybe I misunderstood the way you are creating your SchemaRDD, or maybe you misunderstood the way flatMap is supposed to work. Did you try this?

mySchemaRDD.flatMap( p => p.getString(1).split(" +").map( x => Row((p.getString(0), x))))

I think that mySchamaRdd.flatMap( p => Row.fromSeq(...)) is not the appropriate use of flatMap, since you are supposed to return a sequence of things embedded in something, in order to flat it and extract what you really want to return. In your case, you are embedding it in a Row, while the result type you want to return is Row itself.

Upvotes: 0

Related Questions