mohan chakradhar v
mohan chakradhar v

Reputation: 1

pyspark regexp_replace replacing multiple values in a column

I have the url https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\r in dataset. I want to remove https:// at the start of the string and \r at the end of the string.

Creating dataframe to replicate the issue

c = spark.createDataFrame([('https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\r',)], ['str'])

I tried below regexp_replace with pipe function. But it is not working as expected.

c.select(F.regexp_replace('str', 'https:// | \\r', '')).first()

Actual output: www.youcuomizei.comEquaion-Kid-Backack-Peronalized301793

Expected output: www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793

Upvotes: 0

Views: 441

Answers (1)

iambdot
iambdot

Reputation: 945

the "backslash"r (\r) is not showing in your original spark.createDataFrame object because you have to escape it. so your spark.createDataFrame should be. please note the double backslashes

c = spark.createDataFrame([("https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\\r",)], ['str'])

which will give this output:

+------------------------------------------------------------------------------+
|str                                                                           |
+------------------------------------------------------------------------------+
|https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\r|
+------------------------------------------------------------------------------+

your regex https://|[\\r] will not remove the \r . the regex should be

c = (c
    .withColumn("str", F.regexp_replace("str", "https://|[\\\\]r", "")) 
)

which will give this output:

+--------------------------------------------------------------------+
|str                                                                 |
+--------------------------------------------------------------------+
|www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793|
+--------------------------------------------------------------------+

Upvotes: 0

Related Questions