Hu.Cai
Hu.Cai

Reputation: 33

bug for spark sql function rtrim?

when the specified character string contains . character, the result will be wrong. Please keep in mind that the data type of first_specific_businessn_line_code column is string. The following two statements always shows the same result.


 newData
  .withColumn("c",rtrim($"first_specific_businessn_line_code",".0"))
  .show(false)

 newData
  .withColumn("c",rtrim($"first_specific_businessn_line_code","\\.0"))
  .show(false)

+----------------------------------+---+
|first_specific_businessn_line_code|c  |
+----------------------------------+---+
|8.0                               |8  |
|80.0                              |8  |
+----------------------------------+---+

enter image description here

Upvotes: 0

Views: 195

Answers (2)

mck
mck

Reputation: 42352

You can use regexp_replace to replace .0 which appears at the end ($ in regex):

newData.withColumn("c", regexp_replace($"first_specific_businessn_line_code", "\\.0$", "")).show
+----------------------------------+---+
|first_specific_businessn_line_code|  c|
+----------------------------------+---+
|                               8.0|  8|
|                              80.0| 80|
+----------------------------------+---+

Upvotes: 0

Mohana B C
Mohana B C

Reputation: 5487

That's not a bug. rtrim will remove the characters which we specify.

Please check this link : rtrim function

rtrim(80.0,".0") --> This will remove . and 0 in the trailing end of the column value. so the result is 8

You can use regexp_replace/regexp_extract to achieve the result.

val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("OFF")
import spark.implicits._
import org.apache.spark.sql.functions._
Seq("8.0","80.0").toDF()
      .withColumn("regexp_replace",regexp_replace('value,"[.]\\d+",""))
      .withColumn("regexp_extract",regexp_extract('value,"(\\d+).(\\d+)",1))
      .show()

 /* output
 +-----+--------------+--------------+
 |value|regexp_replace|regexp_extract|
 +-----+--------------+--------------+
 |  8.0|             8|             8|
 | 80.0|            80|            80|
 +-----+--------------+--------------+

 */

Upvotes: 2

Related Questions