Reputation: 11621
Let's say I have a Dataframe like
df = spark.createDataFrame(
[
('Test1 This is a test Test2','This is a test'),
('That is','That')
],
['text','name'])
+--------------------------+--------------+
|text |name |
+--------------------------+--------------+
|Test1 This is a test Test2|This is a test|
|That is |That |
+--------------------------+--------------+
If I apply df.withColumn("new",F.expr("regexp_replace(text,name,'')")).show(truncate=False)
it works fine and results in
+--------------------------+--------------+------------+
|text |name |new |
+--------------------------+--------------+------------+
|Test1 This is a test Test2|This is a test|Test1 Test2|
|That is |That | is |
+--------------------------+--------------+------------+
So let's say I have the following Dataframe
+-----------------------------+-----------------+
|text |name |
+-----------------------------+-----------------+
|Test1 This is a test(+1 Test2|This is a test(+1|
|That is |That |
+-----------------------------+-----------------+
If I apply the the command from above I get the following error message:
java.util.regex.PatternSyntaxException: Dangling meta character '+'
What can I do so that this exception does not occur in the most "pyspark" way and keeping the value in text as is?
Thanks
Upvotes: 3
Views: 1347
Reputation: 31460
Instead of regexp_replace
use replace
function in spark.
replace(str, search[, replace]) - Replaces all occurrences of search with replace.
Example:
df.show(10,False)
#+-----------------------------+-----------------+
#|text |name |
#+-----------------------------+-----------------+
#|Test1 This is a test(+1 Test2|This is a test(+1|
#|That is |That |
#+-----------------------------+-----------------+
df.withColumn("new",expr("replace(text,name,'')")).show(10,False)
#+-----------------------------+-----------------+------------+
#|text |name |new |
#+-----------------------------+-----------------+------------+
#|Test1 This is a test(+1 Test2|This is a test(+1|Test1 Test2|
#|That is |That | is |
#+-----------------------------+-----------------+------------+
Upvotes: 4