Venkatesh Gotimukul
Venkatesh Gotimukul

Reputation: 761

Scala Converting hexadecimal substring of column to decimal - Dataframe org.apache.spark.sql.catalyst.parser.ParseException:

   val DF = Seq("310:120:fe5ab02").toDF("id")

+-----------------+
|       id        |
+-----------------+
| 310:120:fe5ab02 |
+-----------------+


+-----------------+-------------+--------+
|       id        |      id1    |   id2  |
+-----------------+-------------+--------+
| 310:120:fe5ab02 |      2      | 1041835| 
+-----------------+-------------+--------+

I need to convert two substrings of a string from a column from hexadecimal to decimal and create two new columns in Dataframe.

id1->   310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(5) -> 02 -> ParseInt(x,16) ->  2
id2->   310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(0,5) -> fe5ab -> ParseInt(x,16) ->  1041835

From "310:120:fe5ab02" i need "fe5ab02" which i get by doing x.split(":")(2) and then i need two substrings "fe5ab" and "02" which i get by x.substring(0,5),x.substring(5) Then i need to convert them into Decimal which i get by Integer.parseInt(x,16)

These work good individually but i need them in a single withColumn statement like below

val DF1 = DF
.withColumn("id1", expr("""Integer.parseInt((id.split(":")(2)).substring(5), 16)"""))
.withColumn("id2", expr("""Integer.parseInt((id.split(":")(2)).substring(0, 5), 16)"""))

display(DF1)

I am getting a parsing exception.

Upvotes: 0

Views: 871

Answers (1)

Mansoor Baba Shaik
Mansoor Baba Shaik

Reputation: 492

case class SplitId(part1: Int, part2: Int)

def splitHex: (String => SplitId) = { s => {
    val str: String = s.split(":")(2)
    SplitId(Integer.parseInt(str.substring(5), 16), Integer.parseInt(str.substring(0,5), 16))
  }
}

import org.apache.spark.sql.functions.udf

val splitHexUDF = udf(splitHex)

df.withColumn("splitId", splitHexUDF(df("id"))).withColumn("id1", $"splitId.part1").withColumn("id2",  $"splitId.part2").drop($"splitId").show()
+---------------+---+-------+
|             id|id1|    id2|
+---------------+---+-------+
|310:120:fe5ab02|  2|1041835|
+---------------+---+-------+

Alternatively, you can use the below snippet without UDF

import org.apache.spark.sql.functions._

val df2 = df.withColumn("splitId", split($"id", ":")(2))
  .withColumn("id1", $"splitId".substr(lit(6), length($"splitId")-1).cast("int"))
  .withColumn("id2", conv(substring($"splitId", 0, 5), 16, 10).cast("int"))
  .drop($"splitId")

df2.printSchema
root
 |-- id: string (nullable = true)
 |-- id1: integer (nullable = true)
 |-- id2: integer (nullable = true)

df2.show()
+---------------+---+-------+
|             id|id1|    id2|
+---------------+---+-------+
|310:120:fe5ab02|  2|1041835|
+---------------+---+-------+

Upvotes: 1

Related Questions