Reputation: 75
I need to find and replace any space between digits in a long string using regular expression.
I have tried to use regular expression such as [0-9][\s][0-9]
and then regexp_replace such as .withColumn('free_text', regexp_replace('free_text', '[0-9][\s][0-9]', ''))
.
However, the regex matches 1(space)4
where I would like to have only (space)
Here is an example:
What I have:
"Hello. I am Marie. My number is 768 990"
What I would like to have:
"Hello. I am Marie. My number is 768990"
Thanks,
Upvotes: 2
Views: 1207
Reputation: 163362
Your pattern matches a digit, whitespace character and a digit. Note that \s
also matches a newline.
If supported, you could use lookarounds instead of matching the digits:
(?<=[0-9])\s(?=[0-9])
.withColumn('free_text', regexp_replace('free_text', '(?<=[0-9])\s(?=[0-9])', ''))
Upvotes: 2
Reputation: 521457
Here is one way to do this, using capture groups:
.withColumn('free_text', regexp_replace('free_text', '([0-9])\s([0-9])', '$1$2'))
The idea here is to match and capture the two digits separated by a whitespace character in between them. Then, we can replace by just the two digits adjacent.
Upvotes: 4