Naveen Reddy Marthala
Naveen Reddy Marthala

Reputation: 3123

removing symbol only when they are next/prior to a word in python

I would like to remove colons(:) next/prior to words, but not remove the colons next/prior to numbers(e.g., in time). meaning, I don't want the colons in time to be removed, only the colons associated with words.

Example Input:

2018-05-21T00:00:00+02:00
00:00:00
00:00
:light at 07:15:0000

:occurence1
:occurence2

light: at 07:15:0000

occurence1:
occurence2:

Expected output: (with colons next/prior to words removed)

2018-05-21T00:00:00+02:00
00:00:00
00:00
light at 07:15:0000

occurence1
occurence2

light at 07:15:0000

occurence1
occurence2

I could match them with regex pattern: ([A-Za-z_]\w*:|:[A-Za-z_]\w*). but haven't been able to remove the colons, with re.sub().

How do I do this with python3.8 with/without regex?

Upvotes: 0

Views: 125

Answers (1)

The fourth bird
The fourth bird

Reputation: 163457

In this part [A-Za-z_]\w*: the \w also matches digits resulting in matching too much. In the second pattern, you can omit the \w* at the end as it is optional and not necessary in the replacement.

You are also matching the whole part in stead of the parts without the :, and using the whole part with re.sub will result in removing everything that is matched.

You can use 2 separate groups where the first group can capture optional digits, match : without being followed by a digit and use the groups in the replacement.

([A-Za-z_]\d*):(?!\d)|:([A-Za-z_])

Regex demo

import re

regex = r"([A-Za-z_]\d*):(?!\d)|:([A-Za-z_])"

s = ("2018-05-21T00:00:00+02:00\n"
     "00:00:00\n"
     "00:00\n"
     ":light at 07:15:0000\n\n"
     ":occurence1\n"
     ":occurence2\n\n"
     "light: at 07:15:0000\n\n"
     "occurence1:\n"
     "occurence2:")

result = re.sub(regex, r"\1\2", s)

if result:
    print (result)

Output

2018-05-21T00:00:00+02:00
00:00:00
00:00
light at 07:15:0000

occurence1
occurence2

light at 07:15:0000

occurence1
occurence2

A lookaround variant as commented by Wiktor Stribiżew with an extra alternation asserting a digit to the left and no digit to the right.

(?<=[A-Za-z]):|:(?=[A-Za-z])|(?<=\d):(?!\d)

Regex demo

Upvotes: 1

Related Questions