Reputation: 323
I have few documents which have text with hyphen at the start of the word, in between the words and at the start and end of the word. I need help with regex to remove the hyphens for these 3 scenarios
sample text : ease singapore-based -fis -sgfis- fatca"
I tried the following regex
re.sub(r'[^A-Za-z0-9]+', ' ', "ease singapore-based fis -sgfis- fatca" but it removes all hypens.
Upvotes: 1
Views: 2198
Reputation: 18840
Solution 1:
re.sub(r'-\b|\b-', ' ', "ease singapore-based fis -sgfis- fatca")
# trim multiple spaces here
Expression 1:
"-\b|\b-"
\b
as of word Dividing line
or Solution 2
re.sub(r'\s-\b|\b-\s', ' ', "ease singapore-based fis -sgfis- fatca")
Expression 2:
"\s-\b|\b-\s"
\s as of whitespace char
If you need to "singapore-based" to become "singapore based" use solution 2 and combine it with \b-\b
:
so you will end up with (\b-\b)|(\s-\b|\b-\s)
Solution 3:
re.sub(r'(\b-\b)|(\s-\b|\b-\s)', ' ', "ease singapore-based fis -sgfis- fatca")
# no space trimming required
Upvotes: 3
Reputation: 521229
Here is one approach:
inp = "ease singapore-based -fis -sgfis- fatca"
output = re.sub(r'(?<=\w)-|-(?=\w)', '', inp)
print(output) # ease singaporebased fis sgfis fatca
The regex used above says to match:
(?<=\w)- match a hyphen preceded by a word character
| OR
-(?=\w) match a hyphen followed by a word character
Then, we replace such matching hyphens with empty string, to remove them.
Upvotes: 2