Reputation: 761
val DF = Seq("310:120:fe5ab02").toDF("id")
+-----------------+
| id |
+-----------------+
| 310:120:fe5ab02 |
+-----------------+
+-----------------+-------------+--------+
| id | id1 | id2 |
+-----------------+-------------+--------+
| 310:120:fe5ab02 | 2 | 1041835|
+-----------------+-------------+--------+
I need to convert two substrings of a string from a column from hexadecimal to decimal and create two new columns in Dataframe.
id1-> 310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(5) -> 02 -> ParseInt(x,16) -> 2
id2-> 310:120:fe5ab02 ->x.(split(":")(2)) -> fe5ab02 -> substring(0,5) -> fe5ab -> ParseInt(x,16) -> 1041835
From "310:120:fe5ab02" i need "fe5ab02" which i get by doing x.split(":")(2) and then i need two substrings "fe5ab" and "02" which i get by x.substring(0,5),x.substring(5) Then i need to convert them into Decimal which i get by Integer.parseInt(x,16)
These work good individually but i need them in a single withColumn statement like below
val DF1 = DF
.withColumn("id1", expr("""Integer.parseInt((id.split(":")(2)).substring(5), 16)"""))
.withColumn("id2", expr("""Integer.parseInt((id.split(":")(2)).substring(0, 5), 16)"""))
display(DF1)
I am getting a parsing exception.
Upvotes: 0
Views: 871
Reputation: 492
case class SplitId(part1: Int, part2: Int)
def splitHex: (String => SplitId) = { s => {
val str: String = s.split(":")(2)
SplitId(Integer.parseInt(str.substring(5), 16), Integer.parseInt(str.substring(0,5), 16))
}
}
import org.apache.spark.sql.functions.udf
val splitHexUDF = udf(splitHex)
df.withColumn("splitId", splitHexUDF(df("id"))).withColumn("id1", $"splitId.part1").withColumn("id2", $"splitId.part2").drop($"splitId").show()
+---------------+---+-------+
| id|id1| id2|
+---------------+---+-------+
|310:120:fe5ab02| 2|1041835|
+---------------+---+-------+
Alternatively, you can use the below snippet without UDF
import org.apache.spark.sql.functions._
val df2 = df.withColumn("splitId", split($"id", ":")(2))
.withColumn("id1", $"splitId".substr(lit(6), length($"splitId")-1).cast("int"))
.withColumn("id2", conv(substring($"splitId", 0, 5), 16, 10).cast("int"))
.drop($"splitId")
df2.printSchema
root
|-- id: string (nullable = true)
|-- id1: integer (nullable = true)
|-- id2: integer (nullable = true)
df2.show()
+---------------+---+-------+
| id|id1| id2|
+---------------+---+-------+
|310:120:fe5ab02| 2|1041835|
+---------------+---+-------+
Upvotes: 1