Reputation: 67
There is a column batch in dataframe. It has values like '9%','$5', etc.
I need use regex_replace
in a way that it removes the special characters from the above example and keep just the numeric part.
Examples like 9 and 5 replacing 9% and $5 respectively in the same column.
Upvotes: 3
Views: 40720
Reputation: 436
You can use this regex:
\W+
\W
- matches any non-word character (equal to [^a-zA-Z0-9_])
Upvotes: 2
Reputation: 11244
What have you tried so far?
select regexp_replace("'$5','9%'","[^0-9A-Za-z]","")
Upvotes: 1
Reputation: 6218
df.withColumn("batch",regexp_replace(col("batch"), "/[^0-9]+/", ""))
Upvotes: 7