Reputation: 1176
I am running into a problem trying to convert one of the columns of a spark dataframe from a hexadecimal string to a double. I have the following code:
import spark.implicits._
case class MsgRow(block_number: Long, to: String, from: String, value: Double )
def hex2int (hex: String): Double = (new BigInteger(hex.substring(2),16)).doubleValue
txs = txs.map(row=>
MsgRow(row.getLong(0), row.getString(1), row.getString(2), hex2int(row.getString(3)))
)
I can't share the content of my txs dataframe but here is the metadata:
>txs
org.apache.spark.sql.DataFrame = [blockNumber: bigint, to: string ... 4 more fields]
but when I run this I get the error:
error: type mismatch; found : MsgRow required: org.apache.spark.sql.Row MsgRow(row.getLong(0), row.getString(1), row.getString(2), hex2int(row.getString(3))) ^
I don't understand -- why is spark/scala expecting a row object? None of the examples I have seen involve an explicit conversion to a row, and in fact most of them involve an anonymous function returning a case class object, as I have above. And for some reason, googling "required: org.apache.spark.sql.Row" returns only five results, none of which pertains to my situation. Which is why I made the title so non-specific since there is little chance of a false positive. Thanks in advance!
Upvotes: 0
Views: 4280
Reputation: 1176
Thank you @Ramesh for pointing out the bug in my code. His solution works, though it also does not mention the problem that pertains more directly to my OP, which is that the result returned from map is not a dataframe but rather a dataset. Rather than creating a new variable, all I need to do was change
txs = txs.map(row=>
MsgRow(row.getLong(0), row.getString(1), row.getString(2), hex2int(row.getString(3)))
)
to
txs = txs.map(row=>
MsgRow(row.getLong(0), row.getString(1), row.getString(2), hex2int(row.getString(3)))
).toDF
This would probably be the easy answer for most errors containing my title. While @Ramesh's answer got rid of that error, I ran into another error later raleted to the same fundamental issue when I tried to join this result to another dataframe.
Upvotes: 0
Reputation: 41957
Your error is because you are storing the output to the same variable and txs
is expecting Row
while you are returning MsgRow
. so changing
txs = txs.map(row=>
MsgRow(row.getLong(0), row.getString(1), row.getString(2), hex2int(row.getString(3)))
)
to
val newTxs = txs.map(row=>
MsgRow(row.getLong(0), row.getString(1), row.getString(2), (new BigInteger(row.getString(3).substring(2),16)).doubleValue)
)
should solve your issue.
I have excluded the hex2int
function as its giving serialization error.
Upvotes: 1