Reputation: 229
how to use a scala udf to change the order of a string
root
|-- Loc: string (nullable = true)
+----------------+
| Loc|
+----------------+
|8106f510000dc502|
+----------------+
8106f510000dc502 to 08f150000dc50261
i want to convert it like this order [3,1,5,7,6,(8-16),4,2]
Upvotes: 1
Views: 180
Reputation: 41957
If you are looking to turn from 100z200
to 200z100
on column Loc
then defining udf
function as below should be sufficient (assuming that you have z
in middle of every string in the column)
def reverseReplace = udf((str: String) => {
val index = str.indexOf("z")
str.substring(index+1, str.length)+str.substring(index, index+1)+str.substring(0, index)
})
You can call the udf
function as
val m4=msc3.select("Loc").withColumn("Info", reverseReplace($"Loc"))
m4.show(false)
You will have following output
+-------+-------+
|Loc |Info |
+-------+-------+
|100z200|200z100|
|30z400 |400z30 |
|600z10 |10z600 |
+-------+-------+
Edited
According to what I understand from your updated question that you want your final result in [3,1,5,7,6,(8-16),4,2]
order, following can be your udf
function
def reverseReplace = udf((str: String) => {
val len = str.length
val index = 16 > len match {case true => len case false => 16}
var finalStr = ""
if(len > 2)
finalStr += str.substring(3-1,3)
if(len > 0)
finalStr += str.substring(1-1,1)
if(len > 4)
finalStr += str.substring(5-1,5)
if(len > 6)
finalStr += str.substring(7-1,7)
if(len > 5)
finalStr += str.substring(6-1,6)
if(len > 7)
finalStr += str.substring(8-1, index)
if(len > 3)
finalStr += str.substring(4-1,4)
if(len > 1)
finalStr += str.substring(2-1,2)
if(finalStr == "")
finalStr = str
finalStr
})
You can call this udf
function as stated above
Upvotes: 0
Reputation: 74619
That appears like a Scala coding assignment and has almost nothing to do with Spark.
I'd do the following:
// the dataset
val loc = Seq("8106f510000dc502").toDF("Loc")
// the udf for decoding loc
def mydecode(codes: Seq[Int]) = udf { s: String =>
codes.map(pos => s.charAt(pos)).mkString
}
val codes = Seq(3,1,5,7,6,4,2)
val decoded = loc.withColumn("decoded", mydecode(codes)($"loc"))
scala> decoded.show
+----------------+-------+
| Loc|decoded|
+----------------+-------+
|8106f510000dc502|61501f0|
+----------------+-------+
I'm leaving the range in the codes
array, i.e. (8-16)
as your home exercise.
Upvotes: 2
Reputation: 22439
Another approach using regular expression and an UDF that can be assigned a separator (in this case "z"):
def flip(sep: String) = udf(
(s: String) => {
val pattern = s"""(.*?)${sep}(.*)""".r
s match {
case pattern(a, b) => b + sep + a
}
}
)
val df = Seq( ("100z200") ).toDF("Loc")
val dfFlipped = df.withColumn("Flipped", flip("z")($"Loc"))
dfFlipped.show
+-------+-------+
| Loc|Flipped|
+-------+-------+
|100z200|200z100|
+-------+-------+
Upvotes: 0