knowone
knowone

Reputation: 840

RDD split gives missing parameter type

I'm trying to split a RDD created originally from a DF. Not sure why the error.

Not writing every column name but the sql contains all of them. So, nothing wrong with the sql.

val df = sql("SELECT col1, col2, col3,... from tableName")
rddF = df.toJavaRDD

rddFtake(1)
res46: Array[org.apache.spark.sql.Row] = Array([2017-02-26,100102-AF,100134402,119855,1004445,0.0000,0.0000,-3.3,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]

scala> rddF.map(x => x.split(","))
<console>:31: error: missing parameter type
       rdd3.map(x => x.split(","))

Any idea about the error? I'm using Spark 2.2.0

Upvotes: 0

Views: 92

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

rddF is an Array of Row as you can see in res46: Array[org.apache.spark.sql.Row] and you can't split a Row as you split Strings

You can do something like below

val df = sql("SELECT col1, col2, col3,... from tableName")
val rddF = dff.rdd

rddF.map(x => (x.getAs("col1"), x.getAs[String]("col2"), x.get(2)))

Upvotes: 1

Related Questions