Reputation: 157
Suppose I have a string string = 'abcdefghi'
and I want the output as 'a-b-c-d-e-f-g-h-i'
I can easily use '-'.join(string)
and get the required output. But what if I want to do the same using regex? How would I do the same using regex?
I am asking because I'm learning to use regex and would like to know how to think in it.
Upvotes: 4
Views: 1468
Reputation: 174786
Keep it Simple .....
>>> string = 'abcdefghi'
>>> import re
>>> re.sub(r'\B', '-', string)
'a-b-c-d-e-f-g-h-i'
\b
matches between a word character and a non-word character. But \B
does the opposite of \b
, that is it matches between two word characters and two non-word characters.
For more general case,
>>> re.sub(r'(?<=.)(?=.)', '-', string)
'a-b-c-d-e-f-g-h-i'
I'll never let the \B
to go. :)
>>> string = '(a)bc*d+e{f}gh[i]'
>>> re.sub(r'(?<!^)(\B|\b)(?!$)', '-', string)
'(-a-)-b-c-*-d-+-e-{-f-}-g-h-[-i-]'
(?<!^)
Negative lookbehind asserts that the match won't be preceded by start of the line anchor ^
. (\B|\b)
matches word boundary or non-word boundary. (?!$)
negative lookahead asserts that the match won't be followed by an end of the line anchor $
.
Upvotes: 6
Reputation: 26667
A solution using look arounds will be
>>> import re
>>> str="abcdefghi"
>>> re.sub(r'(?<=\w)(?=\w)', '-', str)
'a-b-c-d-e-f-g-h-i'
(?<=\w)
asserts that a letter is presceded by the postion
(?=\w)
asserts that a letter is followed by the postion
OR
>>> re.sub(r'(?<=.)(?=.)', '-', str)
'a-b-c-d-e-f-g-h-i'
Upvotes: 9
Reputation: 365925
Since ''.join(s)
doesn't care if s
is made up of letters, spaces, or anything else, anything using \w
or \B
is going to be wrong for any string that isn't made up of purely "word characters".
You can easily adapt nu11p01n73R's answer to not rely on word characters:
re.sub(r'(?<=.)(?=.)', '-', s)
But Avinash Raj's answer can't be; it relies on the magic of \b
and \B
, and there's no corresponding magic class for "character boundary" like there is for "word boundary".
Of course you could use just a normal capture group and a lookahead, instead of a lookbehind and a lookahead, which is probably a lot simpler:
re.sub(r'(.)(?=.)', r'\1-', s)
Upvotes: 5