Reputation: 840
I'm trying to split a RDD created originally from a DF. Not sure why the error.
Not writing every column name but the sql contains all of them. So, nothing wrong with the sql.
val df = sql("SELECT col1, col2, col3,... from tableName")
rddF = df.toJavaRDD
rddFtake(1)
res46: Array[org.apache.spark.sql.Row] = Array([2017-02-26,100102-AF,100134402,119855,1004445,0.0000,0.0000,-3.3,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]
scala> rddF.map(x => x.split(","))
<console>:31: error: missing parameter type
rdd3.map(x => x.split(","))
Any idea about the error? I'm using Spark 2.2.0
Upvotes: 0
Views: 92
Reputation: 41957
rddF
is an Array of Row
as you can see in res46: Array[org.apache.spark.sql.Row]
and you can't split
a Row
as you split Strings
You can do something like below
val df = sql("SELECT col1, col2, col3,... from tableName")
val rddF = dff.rdd
rddF.map(x => (x.getAs("col1"), x.getAs[String]("col2"), x.get(2)))
Upvotes: 1