Reputation: 3060
In pyspark , how to replace the text ( "\"\"") with empty string .tried with regexp_replace(F.col('new'),'\\' ,''). but not working.
in .csv File contains
|"\\\"\\\""|
df.show is showing like this
\"\"
But i am expecting to print empty('') string
Upvotes: 1
Views: 1602
Reputation: 326
The text and the pattern you're using don't match with each other.
The text you gave as an example would equal to an output of "" while the pattern would be equal to an output of \
Try running the following in the playground to see what I mean.
print("\"\"")
print('\\')
Not sure about the rest as I haven't used pyspark and your code snippet may not include enough information to determine if there are any other issues.
Upvotes: 0
Reputation: 5487
You should escape quotes and \ in regex.
Regex for text "\"\""
is \"\\\"\\\"\"
Below spark-scala code is working fine and same should work in pyspark also.
val inDF = List(""""\"\""""").toDF()
inDF.show()
/*
+------+
| value|
+------+
|"\"\""|
+------+
*/
inDF.withColumn("value", regexp_replace('value, """\"\\\"\\\"\"""", "")).show()
/*
+-----+
|value|
+-----+
| |
+-----+
*/
Upvotes: 1