Reputation: 1345
This is the command I am using to remove "." from data in a df column in spark-scala which is working fine
rfm = rfm.select(regexp_replace(col("tagname"),"\\.","_") as "tagname",col("value"),col("sensor_timestamp")).persist()
But this is not working to remove leading spaces in the same columnar data
rfm = rfm.select(regexp_replace(col("tagname")," ","") as "tagname",col("value"),col("sensor_timestamp")).persist()
There is no error . It just fails to remove any leading spaces that i see in the data
Input : rfmshow()
+--------------------+-----+----------------+
| tagname |value|timestamp |
+--------------------+-----+----------------+
| P.A |101.5| 1.409643313E12|
| P.A |100.5| 1.409643315E12|
| P.A |100.5| 1.409644709E12|
|P.B | 0.0| 1.40964471E12|
Output :
+--------------------+-----+----------------+
| tagname |value|timestamp |
+--------------------+-----+----------------+
| P_A |101.5| 1.409643313E12|
| P_A |100.5| 1.409643315E12|
| P_A |100.5| 1.409644709E12|
|P_B | 0.0| 1.40964471E12|
Upvotes: 2
Views: 9062
Reputation: 1011
You have to provide a pattern not just the space. Provide it as below.
regexp_replace(col("tagname"),"\\s+"," ")
\s+
is for more than one space and one more extra \
is to escape the \ in \s
inside method.
Upvotes: 3