Learn Hadoop
Learn Hadoop

Reputation: 3060

pyspark replace repeated backslash character with empty string

In pyspark , how to replace the text ( "\"\"") with empty string .tried with regexp_replace(F.col('new'),'\\' ,''). but not working.

in .csv File contains

|"\\\"\\\""|

df.show is showing like this

\"\"

But i am expecting to print empty('') string

Upvotes: 1

Views: 1602

Answers (2)

Hernando Abella
Hernando Abella

Reputation: 326

The text and the pattern you're using don't match with each other.

The text you gave as an example would equal to an output of "" while the pattern would be equal to an output of \

Try running the following in the playground to see what I mean.

print("\"\"")
print('\\')

Not sure about the rest as I haven't used pyspark and your code snippet may not include enough information to determine if there are any other issues.

Upvotes: 0

Mohana B C
Mohana B C

Reputation: 5487

You should escape quotes and \ in regex.

Regex for text "\"\"" is \"\\\"\\\"\"

Below spark-scala code is working fine and same should work in pyspark also.

  val inDF = List(""""\"\""""").toDF()

  inDF.show()

   /*
   +------+
   | value|
   +------+
   |"\"\""|
   +------+
   */
  
  inDF.withColumn("value", regexp_replace('value, """\"\\\"\\\"\"""", "")).show()

   /*
   +-----+
   |value|
   +-----+
   |     |
   +-----+
    */

Upvotes: 1

Related Questions