Reputation: 3123
I would like to remove colons(:) next/prior to words, but not remove the colons next/prior to numbers(e.g., in time). meaning, I don't want the colons in time to be removed, only the colons associated with words.
Example Input:
2018-05-21T00:00:00+02:00
00:00:00
00:00
:light at 07:15:0000
:occurence1
:occurence2
light: at 07:15:0000
occurence1:
occurence2:
Expected output: (with colons next/prior to words removed)
2018-05-21T00:00:00+02:00
00:00:00
00:00
light at 07:15:0000
occurence1
occurence2
light at 07:15:0000
occurence1
occurence2
I could match them with regex pattern: ([A-Za-z_]\w*:|:[A-Za-z_]\w*)
. but haven't been able to remove the colons, with re.sub()
.
How do I do this with python3.8 with/without regex?
Upvotes: 0
Views: 125
Reputation: 163457
In this part [A-Za-z_]\w*:
the \w
also matches digits resulting in matching too much. In the second pattern, you can omit the \w*
at the end as it is optional and not necessary in the replacement.
You are also matching the whole part in stead of the parts without the :
, and using the whole part with re.sub will result in removing everything that is matched.
You can use 2 separate groups where the first group can capture optional digits, match :
without being followed by a digit and use the groups in the replacement.
([A-Za-z_]\d*):(?!\d)|:([A-Za-z_])
import re
regex = r"([A-Za-z_]\d*):(?!\d)|:([A-Za-z_])"
s = ("2018-05-21T00:00:00+02:00\n"
"00:00:00\n"
"00:00\n"
":light at 07:15:0000\n\n"
":occurence1\n"
":occurence2\n\n"
"light: at 07:15:0000\n\n"
"occurence1:\n"
"occurence2:")
result = re.sub(regex, r"\1\2", s)
if result:
print (result)
Output
2018-05-21T00:00:00+02:00
00:00:00
00:00
light at 07:15:0000
occurence1
occurence2
light at 07:15:0000
occurence1
occurence2
A lookaround variant as commented by Wiktor Stribiżew with an extra alternation asserting a digit to the left and no digit to the right.
(?<=[A-Za-z]):|:(?=[A-Za-z])|(?<=\d):(?!\d)
Upvotes: 1