go sgenq
go sgenq

Reputation: 323

regex remove hyphens at start ,in between and start & end of the words

I have few documents which have text with hyphen at the start of the word, in between the words and at the start and end of the word. I need help with regex to remove the hyphens for these 3 scenarios

sample text : ease singapore-based -fis -sgfis- fatca"

I tried the following regex

re.sub(r'[^A-Za-z0-9]+', ' ', "ease singapore-based fis -sgfis- fatca" but it removes all hypens.

Upvotes: 1

Views: 2198

Answers (2)

DevWL
DevWL

Reputation: 18840

Solution 1:

re.sub(r'-\b|\b-', ' ', "ease singapore-based fis -sgfis- fatca")
# trim multiple spaces here

Expression 1:

"-\b|\b-"

\b as of word Dividing line

enter image description here

or Solution 2

re.sub(r'\s-\b|\b-\s', ' ', "ease singapore-based fis -sgfis- fatca")

Expression 2:

"\s-\b|\b-\s"

\s as of whitespace char

enter image description here

If you need to "singapore-based" to become "singapore based" use solution 2 and combine it with \b-\b:

so you will end up with (\b-\b)|(\s-\b|\b-\s)

Solution 3:

re.sub(r'(\b-\b)|(\s-\b|\b-\s)', ' ', "ease singapore-based fis -sgfis- fatca")
# no space trimming required

enter image description here

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521229

Here is one approach:

inp = "ease singapore-based -fis -sgfis- fatca"
output = re.sub(r'(?<=\w)-|-(?=\w)', '', inp)
print(output)  # ease singaporebased fis sgfis fatca

The regex used above says to match:

(?<=\w)-  match a hyphen preceded by a word character
|         OR
-(?=\w)   match a hyphen followed by a word character

Then, we replace such matching hyphens with empty string, to remove them.

Upvotes: 2

Related Questions