Reputation: 33
when the specified character string contains . character, the result will be wrong. Please keep in mind that the data type of first_specific_businessn_line_code column is string. The following two statements always shows the same result.
newData .withColumn("c",rtrim($"first_specific_businessn_line_code",".0")) .show(false) newData .withColumn("c",rtrim($"first_specific_businessn_line_code","\\.0")) .show(false) +----------------------------------+---+ |first_specific_businessn_line_code|c | +----------------------------------+---+ |8.0 |8 | |80.0 |8 | +----------------------------------+---+
Upvotes: 0
Views: 195
Reputation: 42352
You can use regexp_replace
to replace .0
which appears at the end ($
in regex):
newData.withColumn("c", regexp_replace($"first_specific_businessn_line_code", "\\.0$", "")).show
+----------------------------------+---+
|first_specific_businessn_line_code| c|
+----------------------------------+---+
| 8.0| 8|
| 80.0| 80|
+----------------------------------+---+
Upvotes: 0
Reputation: 5487
That's not a bug. rtrim will remove the characters which we specify.
Please check this link : rtrim function
rtrim(80.0,".0") --> This will remove . and 0 in the trailing end of the column value. so the result is 8
You can use regexp_replace/regexp_extract to achieve the result.
val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("OFF")
import spark.implicits._
import org.apache.spark.sql.functions._
Seq("8.0","80.0").toDF()
.withColumn("regexp_replace",regexp_replace('value,"[.]\\d+",""))
.withColumn("regexp_extract",regexp_extract('value,"(\\d+).(\\d+)",1))
.show()
/* output
+-----+--------------+--------------+
|value|regexp_replace|regexp_extract|
+-----+--------------+--------------+
| 8.0| 8| 8|
| 80.0| 80| 80|
+-----+--------------+--------------+
*/
Upvotes: 2